Research
Photo: Lei Liang
How does natural genetic variation shape plant responses to a changing climate? Many traits central to plant performance — including stress tolerance, growth, flowering time, reproduction, and yield stability — are complex traits controlled by many genes whose effects often depend strongly on the environment.
Our research seeks to understand how genomes and environments interact to shape these traits, and how this knowledge can be used to predict plant performance, improve breeding, and support the conservation of adaptive diversity.
We study how plants respond to drought, heat, and other climate-related challenges by linking genomic variation to phenotypic plasticity across environments. By combining genomics, quantitative genetics, and computational modelling, we aim to uncover why different genotypes respond differently to the same stress, how adaptive responses evolve, and how environmentally contingent genetic effects can be predicted.
A long-term goal of the group is to turn large-scale biological data into predictive insight for plant breeding and biodiversity management. We use genomic, phenotypic, and environmental information to advance climate-resilient breeding and to help preserve the genetic diversity that underpins future adaptation in natural, agricultural, and forest systems.
Our research
Genetic and genomic basis of climate-responsive trait plasticity
A central focus of the group is to understand how genetic variation shapes phenotypic plasticity under environmental stress. We are particularly interested in complex traits whose expression changes across environments, such as flowering time, reproductive success, growth, and stress tolerance.
Using Arabidopsis thaliana as a model system, alongside crop and forest species where appropriate, we investigate how plants differ in their responses to drought, heat, and other climate-related stresses. This allows us to dissect genotype-by-environment interactions, identify the genetic architecture of plastic responses, and uncover the biological mechanisms that link genome variation to trait variation across environments.
Through this work, we aim to understand how adaptive responses are built, why genotypes differ in environmental sensitivity, and how complex traits evolve under heterogeneous and changing climates.
Genome diversity, genome structure, and adaptive variation
We develop genomic resources to better understand the diversity of plant genomes across populations, breeding materials, and evolutionary timescales. Using systems such as Arabidopsis thaliana and Nicotiana tabacum, we study genome diversity beyond single-reference genomes, with particular interest in structural variation, pan-genome diversity, and genome organization.
These approaches allow us to ask how genome structure differs across lineages, habitats, and breeding populations, and how such variation contributes to trait diversity, adaptation, and long-term resilience. By revealing previously hidden forms of genomic variation, we seek to better understand the raw material on which selection and adaptation act.
Predictive models for climate-resilient breeding and deployment
A major long-term goal of the group is to translate fundamental insights into predictive tools for agriculture, forestry, and biodiversity management. Once we understand how genomic variation shapes plant performance across environments, we can use that knowledge to improve prediction, selection, and deployment under climate uncertainty.
We develop models that integrate genomic, phenotypic, and environmental data to improve plant breeding in heterogeneous environments. This includes multi-environment genomic prediction, plasticity-informed breeding frameworks, and approaches for optimizing trial networks and phenotyping strategies.
More broadly, we are interested in how predictive models can guide both breeding and conservation: improving the efficiency of selection in crops and trees, while also helping identify and preserve adaptive genetic diversity in natural populations.
Vision
We believe that understanding plant adaptation requires linking genome variation, gene regulation, phenotypic plasticity, and performance across environments within a single framework. By combining genomic discovery, quantitative genetics, and predictive modelling, our research aims to advance both climate-resilient breeding and the conservation of adaptive diversity in a rapidly changing world.
We study how natural genetic variation interacts with changing environments to shape complex trait plasticity, and how this knowledge can be used for prediction, breeding, and deployment (illustration: Yanjun Zan).
Key publications:
- Zan Y, Chen S, Ren M,Liu G, Liu Y, Han Y, Dong Y, Zhang Y, Si H, Liu Z (2025) The genome and GeneBank genomics of allotetraploid Nicotiana tabacum provide insights into genome evolution and complex trait regulation. Nature Genetics; 57, 4, 986-996
- Kang M, Wu H, Liu H, Liu W, Zhu M, Han Y, Liu W, Chen C, Song Y, Tan L (2023) The pan-genome and local adaptation of Arabidopsis thaliana. Nature Communications;14,1,6259
- Han Y, Liu L, Lei M, Liu W, Si H, Ji Y, Du Q, Zhu M, Zhang W, Dai Y (2025) Divergent Flowering Time Responses to Increasing Temperatures Are Associated With Transcriptome Plasticity and Epigenetic Modification Differences at FLC Promoter Region of Arabidopsis thaliana. Molecular Ecology; 34, 15, e17544
- Zan Y, Carlborg Ö (2020) Dissecting the genetic regulation of yeast growth plasticity in response to environmental changes. Genes; 11,11,1279
- Zan Y, Carlborg Ö (2019) A polygenic genetic architecture of flowering time in the worldwide Arabidopsis thaliana population. Molecular biology and evolution;36,1,141-154
- Han Y, Du Q, Dai Y, Gu S, Lei M, Liu W, Zhang W, Zhu M, Feng L, Si H (2025) EasyOmics: A graphical interface for population-scale omics data association, integration, and visualization. Plant Communications; 6,5
Team
- April 2026 – Present: Assistant Professor at Umeå Plant Sciences Centre, Department of Plant Physiology, Umeå University, Umeå, Sweden
- June 2023 – March 2026: Assistant Professor at Tobacco Research Institute, Chinese Academy of Agricultural Sciences
- June 2019 – May 2023: Postdoc in Forest Genetics and Genomics, Swedish University of Agricultural Sciences, Sweden
- May 2018 – May 2019: Postdoc in Computational Genomics, Uppsala University, Sweden
- renew the authorization for BibBase on Mendeley, and
- update the BibBase URL in your page the same way you did when you initially set up this page.
CV Y. Zan
Publications
<script src="https://bibbase.org/service/query/FuQrjfxzCTGq6oq7R?commas=true&sort=title&noTitleLinks=true&user=qjXy2oRSBi47oWzAh&wl=1&jsonp=1"></script>
<?php
$contents = file_get_contents("https://bibbase.org/service/query/FuQrjfxzCTGq6oq7R?commas=true&sort=title&noTitleLinks=true&user=qjXy2oRSBi47oWzAh&wl=1");
print_r($contents);
?>
<iframe src="https://bibbase.org/service/query/FuQrjfxzCTGq6oq7R?commas=true&sort=title&noTitleLinks=true&user=qjXy2oRSBi47oWzAh&wl=1"></iframe>
For more details see the documention.
To the site owner:
Action required! Mendeley is changing its API. In order to keep using Mendeley with BibBase past April 14th, you need to:
@article{yu_construction_2026,
title = {Construction of genomic prediction models for leaf protein content in \textit{{Nicotiana} tabacum}},
volume = {243},
issn = {0926-6690},
url = {https://www.sciencedirect.com/science/article/pii/S0926669026004772},
doi = {10.1016/j.indcrop.2026.123090},
abstract = {With its high soluble protein content, large biomass yield, and ease of cultivation, tobacco leaves show strong potential as a novel protein source for livestock. However, the genetic basis underlying leaf protein content remains poorly understood, necessitating the use of genomic prediction models to screen germplasm resources and accelerate the improvement of this trait in future breeding programs. To address this, we analyzed 2517 tobacco germplasm accessions from the Chinese National Tobacco Germplasm Resource Bank, which represent broad genetic diversity, to investigate the genetic architecture of leaf protein content and construct genomic prediction models. Tobacco leaf protein content exhibited a moderate heritability of 0.16, and association analysis identified a significant peak that explained approximately 1\% of the phenotypic variance. We further evaluated the performance of 16 mainstream genomic prediction models using five-fold cross-validation. Among these models, best linear unbiased prediction (rrBLUP) model achieved the highest prediction accuracy (0.87). In addition, rrBLUP required less computational time and resources compared with other models, highlighting its stability and efficiency. Field validation (Longshan County, Hunan Province, 111°37′45″E, 27°30′52″N) confirmed the robustness and accuracy of our genomic selection model. Overall, our results demonstrate that genomic prediction can enable rapid screening of tobacco germplasm resources and substantially enhance the efficiency of developing high-protein varieties.},
urldate = {2026-04-10},
journal = {Industrial Crops and Products},
author = {Yu, Le and Guo, Linjie and Liu, Li and Ren, Min and Cheng, Lirui and Liang, Lei and Yang, Aiguo and Si, Huan and Cai, Changchun and Zan, Yanjun},
month = apr,
year = {2026},
keywords = {Genome-wide association study, Genomic selection, Germplasm, Leaf protein content, Nicotiana tabacum},
pages = {123090},
}
@article{han_development_2026,
title = {Development and application of a genotyping by target sequencing single-nucleotide polymorphism array panel in {Salix} suchowensis},
volume = {27},
issn = {1471-2164},
url = {https://doi.org/10.1186/s12864-026-12690-2},
doi = {10.1186/s12864-026-12690-2},
abstract = {Salix suchowensis is an important species of Salix, known for its rapid growth property and wide application in environmental construction, ecological restoration, wicker production, and biomass energy production. Due to its significance as a sustainable biological resource, S. suchowensis has been the centre of intensive breeding. However, rapid improvement of growth and biomass has been hindered by a lack of genomic resources. To address this limitation, we designed a liquid-phase probe array by genotyping by target sequencing technology. Using whole-genome resequencing data, a total of 39,076 SNPs were selected for the array panel, consisting of trait-associated SNPs, intragenic SNPs, and intergenic SNPs. This panel was validated by genotyping 550 new samples, demonstrating high call rates and effective capture of the population structure. Genome-wide association analysis identified 72 SNPs associated with plant height and ground diameter. Additionally, the array panel shows a high potential for genomic selection, with high prediction accuracy for various traits. These results highlight the efficiency of this panel in capturing genomic variations that are highly valuable for future genetic research and breeding applications.},
language = {en},
number = {1},
urldate = {2026-04-17},
journal = {BMC Genomics},
author = {Han, Yu and Gu, Shaobo and Zhu, Mingjia and Liu, Wei and Feng, Landi and Yin, Tongming and Gao, Xuemeng and Zan, Yanjun and Huang, Rui and Ji, Yan and Liu, Jianquan},
month = mar,
year = {2026},
keywords = {Genome-wide association study, Genomic selection, Ground diameter, Liquid-phase probe array, Plant height},
pages = {355},
}
@article{guo_genomic_2026,
title = {Genomic selection for tar content in {Nicotiana} tabacum: genetic architecture analysis and model evaluation},
volume = {17},
issn = {1664-462X},
shorttitle = {Genomic selection for tar content in {Nicotiana} tabacum},
url = {https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2026.1721129/full},
doi = {10.3389/fpls.2026.1721129},
abstract = {IntroductionDespite its economic importance, reducing tobacco tar content remains challenging due to its complex genetic basis.MethodsHere, we evaluated 436 diverse tobacco accessions to characterize the genetic architecture of tar content and develop an optimized genomic selection strategy. Based on these findings, sixteen genomic prediction models were assessed using five-fold cross-validation.ResultsGenome-wide association analysis detected no major-effect loci, and regional heritability mapping revealed localized enrichment of small-effect variants, particularly on chromosome 17, indicating a predominantly polygenic architecture. rrBLUP achieved the highest prediction accuracy (0.84) with superior computational efficiency, followed closely by GBM (0.83). The robustness of rrBLUP was further confirmed in an independent panel of 36 accessions (Pearson r = 0.888).DiscussionTogether, our results demonstrate that tobacco tar content is governed by dispersed small-effect loci with regional aggregation and establish rrBLUP as a robust and practical model for genome-wide prediction, providing methodological guidance for low-tar tobacco breeding.},
language = {English},
urldate = {2026-04-10},
journal = {Frontiers in Plant Science},
publisher = {Frontiers},
author = {Guo, Linjie and Kong, Bo and Chen, Hui and Ren, Min and Cheng, Lirui and Yang, Aiguo and Liang, Lei and Zan, Yanjun and Si, Huan and Cai, Changchun},
month = mar,
year = {2026},
keywords = {genome-wide association analysis, genomic selection, model evaluation, tar content, tobacco},
}
@article{zhu_mmgs_2026,
title = {{MMGS}: a novel genomic prediction framework to integrate genotype, environment and their interactions for multi-environment breeding trials},
volume = {13},
issn = {2662-6810},
shorttitle = {{MMGS}},
url = {https://doi.org/10.1093/hr/uhag035},
doi = {10.1093/hr/uhag035},
abstract = {Accurately predicting the performance of trees and crops across diverse and changing climates is essential for matching genotypes to both current and future environments. Yet modelling the complex interplay among genotype, environment, and phenotype in multi-environment trials remains a major challenge. Here, we introduce a unified framework, polygenic environmental interaction (PEI), directly models genotype-by-environment interactions through integrating genotypes and environmental covariates. We implemented an ensemble of 15 estimators spanning parametric, non-parametric, and machine-learning approaches. We then benchmarked our framework against the classical reaction norm (RN) using three genetically distinct populations and three traits with variable genetic architectures. Furthermore, we released an open-source R package, Multiple-environments genomic selection (MMGS), on GitHub. Together, our study offers a flexible and computationally efficient approach for multi-environment genomic prediction, enhancing breeding efficiency, providing deeper insights into modelling the genotype-environment-phenotype continuum.},
number = {5},
urldate = {2026-05-15},
journal = {Horticulture Research},
author = {Zhu, Mingjia and Zheng, Zeyu and Liu, Wei and Han, Yu and Mou, Wenjie and Yin, Tongming and Dai, Xiaogang and Wu, Huaitong and Yang, Yongzhi and Zan, Yanjun and Liu, Jianquan},
month = may,
year = {2026},
pages = {uhag035},
}
@article{liu_photosynthesis-related_2026,
title = {Photosynthesis-related genetic and transcriptomic variations contribute to adaptive trait diversity in global {Arabidopsis} thaliana populations},
volume = {26},
issn = {1471-2229},
url = {https://doi.org/10.1186/s12870-026-08279-2},
doi = {10.1186/s12870-026-08279-2},
abstract = {Photosynthesis is the foundational process for carbon fixation in terrestrial ecosystems. Although allelic variations in photosynthesis-related genes have the potential to enhance carbon assimilation efficiency, their functional roles in local adaptation are still not well understood. In this study, we systematically examined the genetic and transcriptomic diversity among globally distributed natural accessions of Arabidopsis thaliana, focusing on 1,103 genes associated with photosynthetic pathways. By assembling chloroplast genomes from 28 representative accessions and integrating whole-genome and transcriptome sequencing data from over 1,000 accessions, we identified extensive allelic variation. Notably, 34.0\% of these genes exhibited regulatory variations through expression quantitative trait locus mapping, including key components such as Rubisco and Rubisco activase. Functional validation demonstrated that overexpression of these genes increased cotyledon size and root length. Additionally, genome-wide and transcriptome-wide association studies revealed that natural selection acting on these allelic variations significantly contributes to local environmental adaptation. Our findings elucidate the connection between genetic variation in photosynthetic pathways and their ecological significance, providing valuable insights for optimizing carbon fixation in dynamic environments.},
language = {en},
number = {1},
urldate = {2026-04-10},
journal = {BMC Plant Biology},
author = {Liu, Wei and Hao, Ruili and Liu, Li and Hou, Jing and Lei, Mengyu and Han, Yu and Zhu, Mingjia and Liang, Lei and Yu, Le and Si, Huan and Liu, Jianquan and Zan, Yanjun and Ji, Yan},
month = feb,
year = {2026},
keywords = {Arabidopsis thaliana, Local adaptation, Natural variation, Photosynthesis pathways},
pages = {468},
}
@article{wang_phytocell_2026,
title = {{PhytoCell}: {An} ensemble learning framework for identifying cell states in plant {scRNA}-seq data},
issn = {2214-5141},
shorttitle = {{PhytoCell}},
url = {https://www.sciencedirect.com/science/article/pii/S2214514126000760},
doi = {10.1016/j.cj.2026.02.021},
abstract = {Single-cell transcriptome sequencing (scRNA-seq) can reveal the roles of diverse cells in an organism, but accurately classifying cell subpopulations and their marker genes remains a challenge. Here, we present PhytoCell, an ensemble learning framework that combines feature selection engineering with machine learning to uncover cell markers and annotate cell subpopulations. We evaluated our approach on 120,000 cells from corollas of the dicotyledonous plant species coyote tobacco (Nicotiana attenuata) and eight tissues from the monocotyledonous plant species rice (Oryza sativa). Comprehensive evaluation across species and tissues demonstrated that PhytoCell effectively eliminates redundant information, identifies key cell markers, improves clustering performance, and accurately classifies cell subpopulations. Importantly, PhytoCell did not rely on prior biological knowledge for selecting cell markers, preserving the biological landscape of the original data. For broader accessibility, we developed a user-friendly web interface that provides convenient tools for users to access cell marker resources and perform predictions for cell type. PhytoCell is freely accessible at https://cgris.net/phyto. PhytoCell is scalable to different sizes of single-cell datasets, representing a valuable resource for precise identification in cell research.},
urldate = {2026-05-19},
journal = {The Crop Journal},
author = {Wang, Hao and Yan, Shen and Ma, Xiaoding and Si, Huan and Lu, Qiong and Chen, Yanqing and Liu, Lijia and Hong, Jingpeng and Xu, Xingjian and Fang, Wei and He, Qiang and Zan, Yanjun and Yang, Aiguo},
month = mar,
year = {2026},
keywords = {Cell subpopulations, Ensemble learning, Machine learning, PhytoCell, scRNA-seq},
}
@article{lei_comprehensive_2025,
title = {Comprehensive analysis of 1,771 transcriptomes from 7 tissues enhance genetic and biological interpretations of maize complex traits},
volume = {15},
issn = {2160-1836},
url = {https://doi.org/10.1093/g3journal/jkaf140},
doi = {10.1093/g3journal/jkaf140},
abstract = {By reanalyzing 1,771 RNA-seq datasets from 7 tissues in a maize diversity panel, we explored the landscape of multi-tissue transcriptome variation, evolution patterns of tissue-specific genes, and built a comprehensive multi-tissue gene regulation atlas to understand the genetic regulation of maize complex traits. Through an integrative analysis of tissue-specific gene regulatory variation with genome-wide association studies, we detected relevant tissue types and several candidate genes for a number of agronomic traits, including leaf during the day for the anthesis-silking interval, leaf during the day for kernel Zeinoxanthin level, and root for ear height, highlighting the potential contribution of tissue-specific gene expression to variation in agronomic traits. Using transcriptome-wide association and colocalization analysis, we associated tissue-specific expression variation of 74 genes to agronomic traits variation. Our findings provide novel insights into the genetic and biological mechanisms underlying maize complex traits, and the multi-tissue regulatory atlas serves as a primary source for biological interpretation, functional validation, and genomic improvement of maize.},
number = {9},
urldate = {2025-09-19},
journal = {G3 Genes{\textbar}Genomes{\textbar}Genetics},
author = {Lei, Mengyu and Si, Huan and Zhu, Mingjia and Han, Yu and Liu, Wei and Dai, Yifei and Ji, Yan and Liu, Zhengwen and Hao, Fan and Hao, Ran and Zhao, Jiarui and Ye, Guoyou and Zan, Yanjun},
month = sep,
year = {2025},
pages = {jkaf140},
}
@article{si_development_2025,
title = {Development of two sets of tobacco chromosome segment substitution lines and {QTL} mapping for agronomic and disease resistance traits},
volume = {226},
issn = {0926-6690},
url = {https://www.sciencedirect.com/science/article/pii/S0926669025001682},
doi = {10.1016/j.indcrop.2025.120622},
abstract = {Chromosome segment substitution lines (CSSLs) represent a powerful genetic resource for quantitative trait loci (QTL) mapping, gene cloning and breeding. Here, we developed two sets of CSSLs consisting of 245 and 128 unique lines, which derived from OX2028 × K326 and Samsun × K326 crosses. On average, each CSSL carried 1.8 and 2.9 introgressed segments in the two sets, with an average physical segment length of approximately 34.3 Mb and 27.6 Mb, respectively. These CSSLs covered ∼97 \% and ∼77 \% of the genomes of OX2028 and Samsun, respectively. By performing QTL mapping based on best linear unbiased prediction (BLUP) of phenotypic traits, we identified a total of 64 QTLs associated with six agronomic traits and three disease resistance traits. These QTLs explained phenotypic variation ranging from 1.5 \% to 50.8 \%. Among them, 22 QTLs detected in OX2028 derived population and 42 detected in Samsun derived CSSLs. Notably, a new QTL for tobacco leaf width, qLW1–1 was narrowed down to an 8-Mb interval on chromosome 1, and NtZY01G00114, encoding an auxin-response factor protein, was considered as the candidate gene. Our study provides valuable genetic resources for tobacco breeding and enhances our understanding of the genetic basis of complex traits in tobacco.},
urldate = {2026-05-19},
journal = {Industrial Crops and Products},
author = {Si, Huan and Wang, Dong and Zan, Yanjun and Liu, Wanfeng and Pu, Wenxuan and Li, Xiaoxu and Mao, Hui and Yang, Xingyou and Song, Shiyang and Wang, Yongda and Jiang, Caihong and Pan, Xuhao and Xiao, Zhiliang and Wen, Liuying and Sun, Yiwen and Liu, Dan and Cheng, Lirui and Yang, Aiguo},
month = apr,
year = {2025},
keywords = {Agronomic traits, CSSLs, Disease resistance, QTL mapping, Tobacco},
pages = {120622},
}
@article{han_divergent_2025,
title = {Divergent {Flowering} {Time} {Responses} to {Increasing} {Temperatures} {Are} {Associated} {With} {Transcriptome} {Plasticity} and {Epigenetic} {Modification} {Differences} at {FLC} {Promoter} {Region} of {Arabidopsis} thaliana},
volume = {34},
copyright = {© 2024 John Wiley \& Sons Ltd.},
issn = {1365-294X},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.17544},
doi = {10.1111/mec.17544},
abstract = {Understanding the genetic, and transcriptomic changes that drive the phenotypic plasticity of fitness traits is a central question in evolutionary biology. In this study, we utilised 152 natural Swedish Arabidopsis thaliana accessions with re-sequenced genomes, transcriptomes and methylomes and measured flowering times (FTs) under two temperature conditions (10°C and 16°C) to address this question. We revealed that the northern accessions exhibited advanced flowering in response to decreased temperature, whereas the southern accessions delayed their flowering, indicating a divergent flowering response. This contrast in flowering responses was associated with the isothermality of their native ranges, which potentially enables the northern accessions to complete their life cycle more rapidly in years with shorter growth seasons. At the transcriptome level, we observed extensive rewiring of gene co-expression networks, with the expression of 25 core genes being associated with the mean FT and its plastic variation. Notably, variations in FLC expression sensitivity between northern and southern accessions were found to be associated with the divergence FT response. Further analysis suggests that FLC expression sensitivity is associated with differences in CG, CHG and CHH methylation at the promoter region. Overall, our study revealed the association between transcriptome plasticity and flowering time plasticity among different accessions, providing evidence for its relevance in ecological adaptation. These findings offer deeper insights into the genetics of rapid responses to environmental changes and ecological adaptation.},
language = {en},
number = {15},
urldate = {2026-05-19},
journal = {Molecular Ecology},
author = {Han, Yu and Liu, Li and Lei, Mengyu and Liu, Wei and Si, Huan and Ji, Yan and Du, Qiao and Zhu, Mingjia and Zhang, Wenjia and Dai, Yifei and Liu, Jianquan and Zan, Yanjun},
year = {2025},
note = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.17544},
keywords = {ecological adaptation, flowering time, gene co-expression network, plasticity},
pages = {e17544},
}
@article{han_easyomics_2025,
title = {{EasyOmics}: {A} graphical interface for population-scale omics data association, integration, and visualization},
volume = {6},
issn = {2590-3462},
shorttitle = {{EasyOmics}},
url = {https://www.sciencedirect.com/science/article/pii/S2590346225000550},
doi = {10.1016/j.xplc.2025.101293},
abstract = {The rapid growth of population-scale whole-genome resequencing, RNA sequencing, bisulfite sequencing, and metabolomic and proteomic profiling has led quantitative genetics into the era of big omics data. Association analyses of omics data, such as genome-, transcriptome-, proteome-, and methylome-wide association studies, along with integrative analyses of multiple omics datasets, require various bioinformatics tools, which rely on advanced programming skills and command-line interfaces and thus pose challenges for wet-lab biologists. Here, we present EasyOmics, a stand-alone R Shiny application with a user-friendly interface that enables wet-lab biologists to perform population-scale omics data association, integration, and visualization. The toolkit incorporates multiple functions designed to meet the increasing demand for population-scale omics data analyses, including data quality control, heritability estimation, genome-wide association analysis, conditional association analysis, omics quantitative trait locus mapping, omics-wide association analysis, omics data integration, and visualization. A wide range of publication-quality graphs can be prepared in EasyOmics by pointing and clicking. EasyOmics is a platform-independent software that can be run under all operating systems, with a docker container for quick installation. It is freely available to non-commercial users at Docker Hub https://hub.docker.com/r/yuhan2000/easyomics.},
number = {5},
urldate = {2026-05-19},
journal = {Plant Communications},
author = {Han, Yu and Du, Qiao and Dai, Yifei and Gu, Shaobo and Lei, Mengyu and Liu, Wei and Zhang, Wenjia and Zhu, Mingjia and Feng, Landi and Si, Huan and Liu, Jianquan and Zan, Yanjun},
month = may,
year = {2025},
keywords = {association analysis, bioinformatics, data visualization, omics data},
pages = {101293},
}
@article{liu_genome_2025,
title = {Genome and transcriptome analysis provide insights into the genetics and breeding of {Salix} suchowensis growth traits},
volume = {56},
issn = {1573-5095},
url = {https://doi.org/10.1007/s11056-025-10132-7},
doi = {10.1007/s11056-025-10132-7},
abstract = {Shrub willow species (Salix sp.) are ideal for cultivation as bioenergy source and can serve as effective models for genetic research due to their rapid growth rates. Understanding the genetic basis of growth traits in willows can enhance the development of high-quality cultivars. In this study, we performed genome-wide association studies (GWAS), expression quantitative trait locus (eQTL) mapping, and transcriptome-wide association analyses to investigate the genetic basis of plant height (Height) and ground diameter (GD) in a full-sib population of Salix suchowensis derived from eight parental lines, which were phenotyped across three sites in southwest China. We identified 15 quantitative trait loci (QTLs) associated with Height and 10 QTLs associated with GD. Furthermore, we identified 938 eQTLs influencing the expression of 685 genes, including a trans-eQTL hub, as well as eight co-expression modules correlated with Height and GD. Transcriptome-wide association studies (TWAS) revealed five genes linked to Height and three genes associated with GD. By jointly modeling genetic variations and environmental factors, we developed a multi-site predictive model that outperformed the genomic best linear unbiased prediction. This study provides valuable insights into the genetic regulatory mechanisms underlying key growth traits in S. suchowensis and establishes a genomic prediction model to facilitate the rapid genetic improvement of the species.},
language = {en},
number = {6},
urldate = {2025-11-07},
journal = {New Forests},
author = {Liu, Wei and Gu, Shaobo and Zhu, Mingjia and Han, Yu and Dai, Xiaogang and Wu, Huaitong and Yin, Tongming and Guo, Linjie and Feng, Landi and Zan, Yanjun and Liu, Jianquan},
month = oct,
year = {2025},
keywords = {Breeding prediction, Ground diameter, Plant height, QTL, QTL-by-environment interactions, Salix suchowensis},
pages = {63},
}
@article{wang_identification_2025,
title = {Identification, {Characterization}, {Expression} {Profiling} and {Functional} {Analysis} of {Tobacco} {CalS} {Gene} {Family}},
volume = {15},
copyright = {http://creativecommons.org/licenses/by/3.0/},
issn = {2073-4395},
url = {https://www.mdpi.com/2073-4395/15/4/884},
doi = {10.3390/agronomy15040884},
abstract = {Callose plays an important role in plant development and in response to a wide range of biotic and abiotic stresses. However, the systematic identification of callose synthase (CalS), the major enzyme for callose biosynthesis, has been delayed in crops, especially in Solanaceae. In the current research, 18 CalS genes (NtCalS1–NtCalS18) were identified in Nicotiana tabacum and classified into four subfamilies. A comprehensive analysis of their physicochemical properties, gene structure, and evolutionary history highlighted their evolutionary conservation. We also identified a number of NtCalSs that responded to the infection with Phytophthora nicotianae and Ralstonia solanacearum, as well as to drought and cold treatments, by analyzing RNA-seq data. NtCalS1 and NtCalS12, a highly homologous gene pair, were selected to create mutants using the CRISPR-Cas9 technology for their drastic response to Phytophthora nicotianae infection as well as the strong expression levels in roots. The mutants with the simultaneous knockout of NtCalS1 and NtCalS12, compared with the control plants, displayed more resistance to tobacco black shank caused by Phytophthora nicotianae. Furthermore, the real-time quantitative PCR (qRT-PCR) assay showed that the knockout of NtCalS1 and NtCalS12 activated the signaling pathways mediated by plant hormones salicylic acid (SA), jasmonic acid (JA), and ethylene (ET) before and after the infection of Phytophthora nicotianae and thus may have contributed to tobacco immunity against black shank. These findings contribute valuable information for further understanding the roles of CalS genes in tobacco stress responses and provide alternative genes for resistance improvement.},
language = {en},
number = {4},
urldate = {2026-05-19},
journal = {Agronomy},
publisher = {Multidisciplinary Digital Publishing Institute},
author = {Wang, Hong and Meng, He and Qi, Xiaohan and Pan, Yi and Ji, Bailu and Wen, Liuying and Zan, Yanjun and Si, Huan and Wang, Yuanying and Liu, Dan and Yang, Aiguo and Liu, Zhengwen and Cheng, Lirui},
month = apr,
year = {2025},
keywords = {\textit{CalS} gene family, \textit{Nicotiana tabacum}, callose synthase, expression analysis, gene editing, stress response},
pages = {884},
}
@article{liu_origin_2025,
title = {Origin and de novo domestication of sweet orange},
volume = {57},
copyright = {2025 The Author(s)},
issn = {1546-1718},
url = {https://www.nature.com/articles/s41588-025-02122-4},
doi = {10.1038/s41588-025-02122-4},
abstract = {Sweet orange is cultivated worldwide but suffers from various devastating diseases because of its monogenetic background. The elucidation of the origin of a crop facilitates the domestication of new crops that may better cope with new challenges. Here we collected and sequenced 226 citrus accessions and assembled telomere-to-telomere phased diploid genomes of sweet orange and sour orange. On the basis of a high-resolution haplotype-resolved genome analysis, we inferred that sweet orange originated from a sour orange × mandarin cross and confirmed this model using artificial hybridization experiments. We identified defense-related metabolites that potently inhibited the growth of multiple industrially important pathogenic bacteria. We introduced diversity to sweet orange, which showed wide segregation in fruit flavor and disease resistance and produced canker-resistant sweet orange by selecting defense-related metabolites. Our findings elucidate the origin of sweet orange and de novo domesticated disease-resistant sweet oranges, illuminating a strategy for the rapid domestication of perennial crops.},
language = {en},
number = {3},
urldate = {2026-05-19},
journal = {Nature Genetics},
publisher = {Nature Publishing Group},
author = {Liu, Shengjun and Xu, Yuantao and Yang, Kun and Huang, Yue and Lu, Zhihao and Chen, Shulin and Gao, Xiang and Xiao, Gongao and Chen, Peng and Zeng, Xiuli and Wang, Lun and Zheng, Weikang and Liu, Zishuang and Liao, Guanglian and He, Fa and Liu, Junjie and Wan, Pengfei and Ding, Fang and Ye, Junli and Jiao, Wenbiao and Chai, Lijun and Pan, Zhiyong and Zhang, Fei and Lin, Zongcheng and Zan, Yanjun and Guo, Wenwu and Larkin, Robert M. and Xie, Zongzhou and Wang, Xia and Deng, Xiuxin and Xu, Qiang},
month = mar,
year = {2025},
keywords = {Plant breeding, Plant genetics, Plant molecular biology},
pages = {754--762},
}
@article{zan_genome_2025,
title = {The genome and {GeneBank} genomics of allotetraploid {Nicotiana} tabacum provide insights into genome evolution and complex trait regulation},
volume = {57},
copyright = {2025 The Author(s)},
issn = {1546-1718},
url = {https://www.nature.com/articles/s41588-025-02126-0},
doi = {10.1038/s41588-025-02126-0},
abstract = {Nicotiana tabacum is an allotetraploid hybrid of Nicotiana sylvestris and Nicotiana tomentosiformis and a model organism in genetics. However, features of subgenome evolution, expression coordination, genetic diversity and complex traits regulation of N. tabacum remain unresolved. Here we present chromosome-scale assemblies for all three species, and genotype and phenotypic data for 5,196 N. tabacum germplasms. Chromosome rearrangements and epigenetic modifications are associated with genome evolution and expression coordination following polyploidization. Two subgenomes and genes biased toward one subgenome contributed unevenly to complex trait variation. Using 178 marker–trait associations, a reference genotype-to-phenotype map was built for 39 morphological, developmental and disease resistance traits, and a novel gene regulating leaf width was validated. Signatures of positive and polygenic selection during the process of selective breeding were detected. Our study provides insights into genome evolution, complex traits regulation in allotetraploid N. tabacum and the use of GeneBank-scale resources for advancing genetic and genomic research.},
language = {en},
number = {4},
urldate = {2026-05-19},
journal = {Nature Genetics},
publisher = {Nature Publishing Group},
author = {Zan, Yanjun and Chen, Shuai and Ren, Min and Liu, Guoxiang and Liu, Yutong and Han, Yu and Dong, Yang and Zhang, Yao and Si, Huan and Liu, Zhengwen and Liu, Dan and Zhang, Xingwei and Tong, Ying and Li, Yuan and Jiang, Caihong and Wen, Liuying and Xiao, Zhiliang and Sun, Yangyang and Geng, Ruimei and Ji, Yan and Feng, Quanfu and Wang, Yuanying and Ye, Guoyou and Fang, Lingzhao and Chen, Yong and Cheng, Lirui and Yang, Aiguo},
month = apr,
year = {2025},
keywords = {Plant genetics, Plant sciences},
pages = {986--996},
}
@article{feng_dual-trait_2024,
title = {Dual-trait genomic analysis in highly stratified {Arabidopsis} thaliana populations using genome-wide association summary statistics},
volume = {133},
copyright = {2024 The Author(s), under exclusive licence to The Genetics Society},
issn = {1365-2540},
url = {https://www.nature.com/articles/s41437-024-00688-z},
doi = {10.1038/s41437-024-00688-z},
abstract = {Genome-wide association study (GWAS) is a powerful tool to identify genomic loci underlying complex traits. However, the application in natural populations comes with challenges, especially power loss due to population stratification. Here, we introduce a bivariate analysis approach to a GWAS dataset of Arabidopsis thaliana. We demonstrate the efficiency of dual-phenotype analysis to uncover hidden genetic loci masked by population structure via a series of simulations. In real data analysis, a common allele, strongly confounded with population structure, is discovered to be associated with late flowering and slow maturation of the plant. The discovered genetic effect on flowering time is further replicated in independent datasets. Using Mendelian randomization analysis based on summary statistics from our GWAS and expression QTL scans, we predicted and replicated a candidate gene AT1G11560 that potentially causes this association. Further analysis indicates that this locus is co-selected with flowering-time-related genes. The discovered pleiotropic genotype-phenotype map provides new insights into understanding the genetic correlation of complex traits.},
language = {en},
number = {1},
urldate = {2026-05-19},
journal = {Heredity},
publisher = {Nature Publishing Group},
author = {Feng, Xiao and Zan, Yanjun and Li, Ting and Yao, Yue and Ning, Zheng and Li, Jiabei and Charati, Hadi and Xu, Weilin and Wan, Qianhui and Zeng, Dongyu and Zeng, Ziyi and Liu, Yang and Shen, Xia},
month = jul,
year = {2024},
keywords = {Genome-wide association studies, Population genetics, Quantitative trait},
pages = {11--20},
}
@article{yu_shinygsgraphical_2024,
title = {{ShinyGS}—a graphical toolkit with a serial of genetic and machine learning models for genomic selection: application, benchmarking, and recommendations},
volume = {15},
issn = {1664-462X},
shorttitle = {{ShinyGS}—a graphical toolkit with a serial of genetic and machine learning models for genomic selection},
url = {https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2024.1480902/full},
doi = {10.3389/fpls.2024.1480902},
abstract = {Genomic prediction is a powerful approach for improving genetic gain and shortening the breeding cycles in animal and crop breeding programs. A series of statistical and machine learning models has been developed to increase the prediction performance continuously. However, the application of these models requires advanced R programming skills and command-line tools to perform quality control, format input files, and install packages and dependencies, posing challenges for breeders. Here, we present ShinyGS, a stand-alone R Shiny application with a user-friendly interface that allows breeders to perform genomic selection through simple point-and-click actions. This toolkit incorporates 16 methods, including linear models from maximum likelihood and Bayesian framework (BA, BB, BC, BL, and BRR), machine learning models, and a data visualization function. In addition, we benchmarked the performance of all 16 models using multiple populations and traits with varying populations and genetic architecture. Recommendations were given for specific breeding applications. Overall, ShinyGS is a platform-independent software that can be run on all operating systems with a Docker container for quick installation. It is freely available to non-commercial users at Docker Hub (https://hub.docker.com/r/yfd2/ags).},
language = {English},
urldate = {2026-05-19},
journal = {Frontiers in Plant Science},
publisher = {Frontiers},
author = {Yu, Le and Dai, Yifei and Zhu, Mingjia and Guo, Linjie and Ji, Yan and Si, Huan and Cheng, Lirui and Zhao, Tao and Zan, Yanjun},
month = dec,
year = {2024},
keywords = {BLUP, breeding, genomic prediction, graphical toolkit, machine learning},
}
@article{jin_complex_2023,
title = {Complex genetic architecture underlying the plasticity of maize agronomic traits},
volume = {4},
issn = {2590-3462},
url = {https://www.sciencedirect.com/science/article/pii/S2590346222003108},
doi = {10.1016/j.xplc.2022.100473},
abstract = {Phenotypic plasticity is the ability of a given genotype to produce multiple phenotypes in response to changing environmental conditions. Understanding the genetic basis of phenotypic plasticity and establishing a predictive model is highly relevant to future agriculture under a changing climate. Here we report findings on the genetic basis of phenotypic plasticity for 23 complex traits using a diverse maize population planted at five sites with distinct environmental conditions. We found that latitude-related environmental factors were the main drivers of across-site variation in flowering time traits but not in plant architecture or yield traits. For the 23 traits, we detected 109 quantitative trait loci (QTLs), 29 for mean values, 66 for plasticity, and 14 for both parameters, and 80\% of the QTLs interacted with latitude. The effects of several QTLs changed in magnitude or sign, driving variation in phenotypic plasticity. We experimentally validated one plastic gene, ZmTPS14.1, whose effect was likely mediated by the compensation effect of ZmSPL6 from a downstream pathway. By integrating genetic diversity, environmental variation, and their interaction into a joint model, we could provide site-specific predictions with increased accuracy by as much as 9.9\%, 2.2\%, and 2.6\% for days to tassel, plant height, and ear weight, respectively. This study revealed a complex genetic architecture involving multiple alleles, pleiotropy, and genotype-by-environment interaction that underlies variation in the mean and plasticity of maize complex traits. It provides novel insights into the dynamic genetic architecture of agronomic traits in response to changing environments, paving a practical way toward precision agriculture.},
number = {3},
urldate = {2026-05-19},
journal = {Plant Communications},
author = {Jin, Minliang and Liu, Haijun and Liu, Xiangguo and Guo, Tingting and Guo, Jia and Yin, Yuejia and Ji, Yan and Li, Zhenxian and Zhang, Jinhong and Wang, Xiaqing and Qiao, Feng and Xiao, Yingjie and Zan, Yanjun and Yan, Jianbing},
month = may,
year = {2023},
keywords = {QTL-by-environment interaction, complex traits, crop improvement, phenotypic plasticity},
pages = {100473},
}
@article{ronneburg_low-coverage_2023,
title = {Low-coverage sequencing in a deep intercross of the {Virginia} body weight lines provides insight to the polygenic genetic architecture of growth: novel loci revealed by increased power and improved genome-coverage},
volume = {102},
issn = {0032-5791},
shorttitle = {Low-coverage sequencing in a deep intercross of the {Virginia} body weight lines provides insight to the polygenic genetic architecture of growth},
url = {https://www.sciencedirect.com/science/article/pii/S0032579122004990},
doi = {10.1016/j.psj.2022.102203},
abstract = {Genetic dissection of highly polygenic traits is a challenge, in part due to the power necessary to confidently identify loci with minor effects. Experimental crosses are valuable resources for mapping such traits. Traditionally, genome-wide analyses of experimental crosses have targeted major loci using data from a single generation (often the F2) with individuals from later generations being generated for replication and fine-mapping. Here, we aim to confidently identify minor-effect loci contributing to the highly polygenic basis of the long-term, bi-directional selection responses for 56-d body weight in the Virginia body weight chicken lines. To achieve this, a strategy was developed to make use of data from all generations (F2–F18) of the advanced intercross line, developed by crossing the low and high selected lines after 40 generations of selection. A cost-efficient low-coverage sequencing based approach was used to obtain high-confidence genotypes in 1Mb bins across 99.3\% of the chicken genome for {\textgreater}3,300 intercross individuals. In total, 12 genome-wide significant, and 30 additional suggestive QTL reaching a 10\% FDR threshold, were mapped for 56-d body weight. Only 2 of these QTL reached genome-wide significance in earlier analyses of the F2 generation. The minor-effect QTL mapped here were generally due to an overall increase in power by integrating data across generations, with contributions from increased genome-coverage and improved marker information content. The 12 significant QTL explain {\textgreater}37\% of the difference between the parental lines, three times more than 2 previously reported significant QTL. The 42 significant and suggestive QTL together explain {\textgreater}80\%. Making integrated use of all available samples from multiple generations in experimental crosses are economically feasible using the low-cost, sequencing-based genotyping strategies outlined here. Our empirical results illustrate the value of this strategy for mapping novel minor-effect loci contributing to complex traits to provide a more confident, comprehensive view of the individual loci that form the genetic basis of the highly polygenic, long-term selection responses for 56-d body weight in the Virginia body weight chicken lines.},
number = {5},
urldate = {2026-05-19},
journal = {Poultry Science},
author = {Rönneburg, T. and Zan, Y. and Honaker, C. F. and Siegel, P. B. and Carlborg, Ö.},
month = may,
year = {2023},
keywords = {QTL mapping, advanced intercross line, body weight, low-coverage sequencing},
pages = {102203},
}
@article{kang_pan-genome_2023,
title = {The pan-genome and local adaptation of {Arabidopsis} thaliana},
volume = {14},
copyright = {2023 The Author(s)},
issn = {2041-1723},
url = {https://www.nature.com/articles/s41467-023-42029-4},
doi = {10.1038/s41467-023-42029-4},
abstract = {Arabidopsis thaliana serves as a model species for investigating various aspects of plant biology. However, the contribution of genomic structural variations (SVs) and their associate genes to the local adaptation of this widely distribute species remains unclear. Here, we de novo assemble chromosome-level genomes of 32 A. thaliana ecotypes and determine that variable genes expand the gene pool in different ecotypes and thus assist local adaptation. We develop a graph-based pan-genome and identify 61,332 SVs that overlap with 18,883 genes, some of which are highly involved in ecological adaptation of this species. For instance, we observe a specific 332 bp insertion in the promoter region of the HPCA1 gene in the Tibet-0 ecotype that enhances gene expression, thereby promotes adaptation to alpine environments. These findings augment our understanding of the molecular mechanisms underlying the local adaptation of A. thaliana across diverse habitats.},
language = {en},
number = {1},
urldate = {2026-05-19},
journal = {Nature Communications},
publisher = {Nature Publishing Group},
author = {Kang, Minghui and Wu, Haolin and Liu, Huanhuan and Liu, Wenyu and Zhu, Mingjia and Han, Yu and Liu, Wei and Chen, Chunlin and Song, Yan and Tan, Luna and Yin, Kangqun and Zhao, Yusen and Yan, Zhen and Lou, Shangling and Zan, Yanjun and Liu, Jianquan},
month = oct,
year = {2023},
keywords = {Evolutionary biology, Natural variation in plants, Plant evolution, Structural variation},
pages = {6259},
}
@article{li_pig_2023,
title = {The pig pangenome provides insights into the roles of coding structural variations in genetic diversity and adaptation},
volume = {33},
issn = {1088-9051, 1549-5469},
url = {https://genome.cshlp.org/content/genome/33/10/1833},
doi = {10.1101/gr.277638.122},
abstract = {{\textless}p{\textgreater}Structural variations have emerged as an important driving force for genome evolution and phenotypic variation in various organisms, yet their contributions to genetic diversity and adaptation in domesticated animals remain largely unknown. Here we constructed a pangenome based on 250 sequenced individuals from 32 pig breeds in Eurasia and systematically characterized coding sequence presence/absence variations (PAVs) within pigs. We identified 308.3-Mb nonreference sequences and 3438 novel genes absent from the current reference genome. Gene PAV analysis showed that 16.8\% of the genes in the pangene catalog undergo PAV. A number of newly identified dispensable genes showed close associations with adaptation. For instance, several novel swine leukocyte antigen (SLA) genes discovered in nonreference sequences potentially participate in immune responses to productive and respiratory syndrome virus (PRRSV) infection. We delineated previously unidentified features of the pig mobilome that contained 490,480 transposable element insertion polymorphisms (TIPs) resulting from recent mobilization of 970 TE families, and investigated their population dynamics along with influences on population differentiation and gene expression. In addition, several candidate adaptive TE insertions were detected to be co-opted into genes responsible for responses to hypoxia, skeletal development, regulation of heart contraction, and neuronal cell development, likely contributing to local adaptation of Tibetan wild boars. These findings enhance our understanding on hidden layers of the genetic diversity in pigs and provide novel insights into the role of SVs in the evolutionary adaptation of mammals.{\textless}/p{\textgreater}},
language = {en},
number = {10},
urldate = {2026-05-19},
journal = {Genome Research},
publisher = {Cold Spring Harbor Laboratory Press},
author = {Li, Zhengcao and Liu, Xiaohong and Wang, Chen and Li, Zhenyang and Jiang, Bo and Zhang, Ruifeng and Tong, Lu and Qu, Youping and He, Sheng and Chen, Haifan and Mao, Yafei and Li, Qingnan and Pook, Torsten and Wu, Yu and Zan, Yanjun and Zhang, Hui and Li, Lu and Wen, Keying and Chen, Yaosheng},
month = nov,
year = {2023},
pages = {1833--1847},
}
@article{chen_genetic_2022,
title = {Genetic architecture behind developmental and seasonal control of tree growth and wood properties in {Norway} spruce},
volume = {13},
issn = {1664-462X},
url = {https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2022.927673/full},
doi = {10.3389/fpls.2022.927673},
abstract = {Genetic control of tree growth and wood formation varies depending on the age of the tree and the time of the year. Single-locus, multi-locus, and multitrait genome-wide association studies (GWAS) were conducted on 34 growth and wood property traits in 1303 Norway spruce individuals using exome capture to cover {\textasciitilde}130K single nucleotide polymorphisms (SNPs). GWAS identified associations to the different wood traits in a total of 85 gene models, and several of these were validated in a population of the mother trees. A multi-locus GWAS model identified more SNPs associated with the studied traits than single-locus or multivariate models. Changes in tree age and annual season influenced the genetic architecture of growth and wood properties in unique ways, manifested by non-overlapping SNP loci. In addition to completely novel candidate genes, SNPs were located in genes previously associated with wood formation, such as cellulose synthases and a NAC transcription factor, but that have not been earlier linked to seasonal or age-dependent regulation of wood properties. Interestingly, SNPs associated with the width of the year rings were identified in homologs of Arabidopsis thaliana BARELY ANY MERISTEM 1 and rice BIG GRAIN 1 which have been previously shown to control cell division and biomass production. The results provide tools for future Norway spruce breeding and functional studies.},
language = {English},
urldate = {2026-04-10},
journal = {Frontiers in Plant Science},
publisher = {Frontiers},
author = {Chen, Zhi-Qiang and Zan, Yanjun and Zhou, Linghua and Karlsson, Bo and Tuominen, Hannele and García-Gil, Maria Rosario and Wu, Harry X.},
month = aug,
year = {2022},
keywords = {Developmental Stage, Norway spruce, Seasonal variation, Wood properties, genome-wide association, ⛔ No DOI found},
}
@article{zhou_graph_2022,
title = {Graph pangenome captures missing heritability and empowers tomato breeding},
volume = {606},
copyright = {2022 The Author(s)},
issn = {1476-4687},
url = {https://www.nature.com/articles/s41586-022-04808-9},
doi = {10.1038/s41586-022-04808-9},
abstract = {Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24\% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.},
language = {en},
number = {7914},
urldate = {2026-05-19},
journal = {Nature},
publisher = {Nature Publishing Group},
author = {Zhou, Yao and Zhang, Zhiyang and Bao, Zhigui and Li, Hongbo and Lyu, Yaqing and Zan, Yanjun and Wu, Yaoyao and Cheng, Lin and Fang, Yuhan and Wu, Kun and Zhang, Jinzhe and Lyu, Hongjun and Lin, Tao and Gao, Qiang and Saha, Surya and Mueller, Lukas and Fei, Zhangjun and Städler, Thomas and Xu, Shizhong and Zhang, Zhiwu and Speed, Doug and Huang, Sanwen},
month = jun,
year = {2022},
keywords = {Agricultural genetics, Genome-wide association studies, Genomics, Plant breeding, Structural variation},
pages = {527--534},
}
@article{guo_researching_2022,
title = {Researching on the fine structure and admixture of the worldwide chicken population reveal connections between populations and important events in breeding history},
volume = {15},
copyright = {© 2021 The Authors. Evolutionary Applications published by John Wiley \& Sons Ltd.},
issn = {1752-4571},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/eva.13241},
doi = {10.1111/eva.13241},
abstract = {Here, we have evaluated the general genomic structure and diversity and studied the divergence resulting from selection and historical admixture events for a collection of worldwide chicken breeds. In total, 636 genomes (43 populations) were sequenced from chickens of American, Chinese, Indonesian, and European origin. Evaluated populations included wild junglefowl, rural indigenous chickens, breeds that have been widely used to improve modern western poultry populations and current commercial stocks bred for efficient meat and egg production. In-depth characterizations of the genome structure and genomic relationships among these populations were performed, and population admixture events were investigated. In addition, the genomic architectures of several domestication traits and central documented events in the recent breeding history were explored. Our results provide detailed insights into the contributions from population admixture events described in the historical literature to the genomic variation in the domestic chicken. In particular, we find that the genomes of modern chicken stocks used for meat production both in eastern (Asia) and western (Europe/US) agriculture are dominated by contributions from heavy Asian breeds. Further, by exploring the link between genomic selective divergence and pigmentation, connections to functional genes feather coloring were confirmed.},
language = {en},
number = {4},
urldate = {2026-05-19},
journal = {Evolutionary Applications},
author = {Guo, Ying and Ou, Jen-Hsiang and Zan, Yanjun and Wang, Yuzhe and Li, Huifang and Zhu, Chunhong and Chen, Kuanwei and Zhou, Xin and Hu, Xiaoxiang and Carlborg, Örjan},
year = {2022},
note = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/eva.13241},
keywords = {Asian breeds, admixture, chickens, genomic structure, selection},
pages = {553--564},
}
@article{niu_chinese_2022,
title = {The {Chinese} pine genome and methylome unveil key features of conifer evolution},
volume = {185},
issn = {0092-8674, 1097-4172},
url = {https://www.cell.com/cell/abstract/S0092-8674(21)01428-8},
doi = {10/gnw8q5},
abstract = {Conifers dominate the world’s forest ecosystems and are the most widely planted tree species. Their giant and complex genomes present great challenges for assembling a complete reference genome for evolutionary and genomic studies. We present a 25.4-Gb chromosome-level assembly of Chinese pine (Pinus tabuliformis) and revealed that its genome size is mostly attributable to huge intergenic regions and long introns with high transposable element (TE) content. Large genes with long introns exhibited higher expressions levels. Despite a lack of recent whole-genome duplication, 91.2\% of genes were duplicated through dispersed duplication, and expanded gene families are mainly related to stress responses, which may underpin conifers’ adaptation, particularly in cold and/or arid conditions. The reproductive regulation network is distinct compared with angiosperms. Slow removal of TEs with high-level methylation may have contributed to genomic expansion. This study provides insights into conifer evolution and resources for advancing research on conifer adaptation and development.},
language = {English},
number = {1},
urldate = {2022-02-04},
journal = {Cell},
author = {Niu, Shihui and Li, Jiang and Bo, Wenhao and Yang, Weifei and Zuccolo, Andrea and Giacomello, Stefania and Chen, Xi and Han, Fangxu and Yang, Junhe and Song, Yitong and Nie, Yumeng and Zhou, Biao and Wang, Peiyi and Zuo, Quan and Zhang, Hui and Ma, Jingjing and Wang, Jun and Wang, Lvji and Zhu, Qianya and Zhao, Huanhuan and Liu, Zhanmin and Zhang, Xuemei and Liu, Tao and Pei, Surui and Li, Zhimin and Hu, Yao and Yang, Yehui and Li, Wenzhao and Zan, Yanjun and Zhou, Linghua and Lin, Jinxing and Yuan, Tongqi and Li, Wei and Li, Yue and Wei, Hairong and Wu, Harry X.},
month = jan,
year = {2022},
keywords = {Chinese pine, chromosome-level genome, climate adaptation, conifer evolution, conifer reproduction, gene expression, genome expansion, long intron, methylome},
pages = {204--217.e14},
}
@article{ji_fully_2021,
title = {A fully assembled plastid‐encoded {\textless}span style="font-variant:small-caps;"{\textgreater}{RNA}{\textless}/span{\textgreater} polymerase complex detected in etioplasts and proplastids in {Arabidopsis}},
volume = {171},
issn = {0031-9317, 1399-3054},
shorttitle = {A fully assembled plastid‐encoded {\textless}span style="font-variant},
url = {https://onlinelibrary.wiley.com/doi/10.1111/ppl.13256},
doi = {10.1111/ppl.13256},
language = {en},
number = {3},
urldate = {2021-06-07},
journal = {Physiologia Plantarum},
author = {Ji, Yan and Lehotai, Nóra and Zan, Yanjun and Dubreuil, Carole and Díaz, Manuel Guinea and Strand, Åsa},
month = mar,
year = {2021},
pages = {435--446},
}
@article{bernhardsson_development_2021,
title = {Development of a highly efficient {50K} single nucleotide polymorphism genotyping array for the large and complex genome of {Norway} spruce ( \textit{{Picea} abies} {L}. {Karst}) by whole genome resequencing and its transferability to other spruce species},
volume = {21},
issn = {1755-098X, 1755-0998},
url = {https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13292},
doi = {10.1111/1755-0998.13292},
language = {en},
number = {3},
urldate = {2021-06-07},
journal = {Molecular Ecology Resources},
author = {Bernhardsson, Carolina and Zan, Yanjun and Chen, Zhiqiang and Ingvarsson, Pär K. and Wu, Harry X.},
month = apr,
year = {2021},
pages = {880--896},
}
@article{wang_genomic_2021,
title = {Genomic basis of high-altitude adaptation in {Tibetan} \textit{{Prunus}} fruit trees},
volume = {31},
issn = {0960-9822},
url = {https://www.sciencedirect.com/science/article/pii/S0960982221008915},
doi = {10.1016/j.cub.2021.06.062},
abstract = {The Great Himalayan Mountains and their foothills are believed to be the place of origin and development of many plant species. The genetic basis of adaptation to high plateaus is a fascinating topic that is poorly understood at the population level. We comprehensively collected and sequenced 377 accessions of Prunus germplasm along altitude gradients ranging from 2,067 to 4,492 m in the Himalayas. We de novo assembled three high-quality genomes of Tibetan Prunus species. A comparative analysis of Prunus genomes indicated a remarkable expansion of the SINE retrotransposons occurred in the genomes of Tibetan species. We observed genetic differentiation between Tibetan peaches from high and low altitudes and that genes associated with light stress signaling, especially UV stress signaling, were enriched in the differentiated regions. By profiling the metabolomes of Tibetan peach fruit, we determined 379 metabolites had significant genetic correlations with altitudes and that in particular phenylpropanoids were positively correlated with altitudes. We identified 62 Tibetan peach-specific SINEs that colocalized with metabolites differentially accumualted in Tibetan relative to cultivated peach. We demonstrated that two SINEs were inserted in a locus controlling the accumulation of 3-O-feruloyl quinic acid. SINE1 was specific to Tibetan peach. SINE2 was predominant in high altitudes and associated with the accumulation of 3-O-feruloyl quinic acid. These genomic and metabolic data for Prunus populations native to the Himalayan region indicate that the expansion of SINE retrotransposons helped Tibetan Prunus species adapt to the harsh environment of the Himalayan plateau by promoting the accumulation of beneficial metabolites.},
number = {17},
urldate = {2026-05-19},
journal = {Current Biology},
author = {Wang, Xia and Liu, Shengjun and Zuo, Hao and Zheng, Weikang and Zhang, Shanshan and Huang, Yue and Pingcuo, Gesang and Ying, Hong and Zhao, Fan and Li, Yuanrong and Liu, Junwei and Yi, Ting-Shuang and Zan, Yanjun and Larkin, Robert M. and Deng, Xiuxin and Zeng, Xiuli and Xu, Qiang},
month = sep,
year = {2021},
keywords = {Himalayas, SINE insertion, Tibetan peach, UV, metabolome, phenylpropanoids},
pages = {3848--3860.e8},
}
@article{chen_leveraging_2021,
title = {Leveraging breeding programs and genomic data in {Norway} spruce ({Picea} abies {L}. {Karst}) for {GWAS} analysis},
volume = {22},
issn = {1474-760X},
url = {https://doi.org/10.1186/s13059-021-02392-1},
doi = {10.1186/s13059-021-02392-1},
abstract = {Genome-wide association studies (GWAS) identify loci underlying the variation of complex traits. One of the main limitations of GWAS is the availability of reliable phenotypic data, particularly for long-lived tree species. Although an extensive amount of phenotypic data already exists in breeding programs, accounting for its high heterogeneity is a great challenge. We combine spatial and factor-analytics analyses to standardize the heterogeneous data from 120 field experiments of 483,424 progenies of Norway spruce to implement the largest reported GWAS for trees using 134 605 SNPs from exome sequencing of 5056 parental trees.},
number = {1},
urldate = {2021-10-14},
journal = {Genome Biology},
author = {Chen, Zhi-Qiang and Zan, Yanjun and Milesi, Pascal and Zhou, Linghua and Chen, Jun and Li, Lili and Cui, BinBin and Niu, Shihui and Westin, Johan and Karlsson, Bo and García-Gil, Maria Rosario and Lascoux, Martin and Wu, Harry X.},
month = jun,
year = {2021},
keywords = {Budburst stage, Frost damage, Genome-wide association study, MAP3K gene, Norway spruce, Wood quality},
pages = {179},
}
@article{zan_dissecting_2020,
title = {Dissecting the {Genetic} {Regulation} of {Yeast} {Growth} {Plasticity} in {Response} to {Environmental} {Changes}},
volume = {11},
copyright = {http://creativecommons.org/licenses/by/3.0/},
issn = {2073-4425},
url = {https://www.mdpi.com/2073-4425/11/11/1279},
doi = {10.3390/genes11111279},
abstract = {Variable individual responses to environmental changes, such as phenotype plasticity, are heritable, with some genotypes being robust and others plastic. This variation for plasticity contributes to variance in complex traits as genotype-by-environment interactions (G × E). However, the genetic basis of this variability in responses to the same external stimuli is still largely unknown. In an earlier study of a large haploid segregant yeast population, genotype-by-genotype-by-environment interactions were found to make important contributions to the release of genetic variation in growth responses to alterations of the growth medium. Here, we explore the genetic basis for heritable variation of different measures of phenotype plasticity in the same dataset. We found that the central loci in the environmentally dependent epistatic networks were associated with overall measures of plasticity, while the specific measures of plasticity identified a more diverse set of loci. Based on this, a rapid one-dimensional genome-wide association (GWA) approach to overall plasticity is proposed as a strategy to efficiently identify key epistatic loci contributing to the phenotype plasticity. The study thus provided both analytical strategies and a deeper understanding of the complex genetic regulation of phenotype plasticity in yeast growth.},
language = {en},
number = {11},
urldate = {2026-05-19},
journal = {Genes},
publisher = {Multidisciplinary Digital Publishing Institute},
author = {Zan, Yanjun and Carlborg, Örjan},
month = nov,
year = {2020},
keywords = {epistasis, genetic networks, genotype by environment interactions, phenotype plasticity, yeast growth},
pages = {1279},
}
@article{zan_dynamic_2020,
title = {Dynamic genetic architecture of yeast response to environmental perturbation shed light on origin of cryptic genetic variation},
volume = {16},
issn = {1553-7404},
url = {https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008801},
doi = {10.1371/journal.pgen.1008801},
abstract = {Cryptic genetic variation could arise from, for example, Gene-by-Gene (G-by-G) or Gene-by-Environment (G-by-E) interactions. The underlying molecular mechanisms and how they influence allelic effects and the genetic variance of complex traits is largely unclear. Here, we empirically explored the role of environmentally influenced epistasis on the suppression and release of cryptic variation by reanalysing a dataset of 4,390 haploid yeast segregants phenotyped on 20 different media. The focus was on 130 epistatic loci, each contributing to segregant growth in at least one environment and that together explained most (69–100\%) of the narrow sense heritability of growth in the individual environments. We revealed that the epistatic growth network reorganised upon environmental changes to alter the estimated marginal (additive) effects of the individual loci, how multi-locus interactions contributed to individual segregant growth and the level of expressed genetic variance in growth. The estimated additive effects varied most across environments for loci that were highly interactive network hubs in some environments but had few or no interactors in other environments, resulting in changes in total genetic variance across environments. This environmentally dependent epistasis was thus an important mechanism for the suppression and release of cryptic variation in this population. Our findings increase the understanding of the complex genetic mechanisms leading to cryptic variation in populations, providing a basis for future studies on the genetic maintenance of trait robustness and development of genetic models for studying and predicting selection responses for quantitative traits in breeding and evolution.},
language = {en},
number = {5},
urldate = {2026-05-19},
journal = {PLOS Genetics},
publisher = {Public Library of Science},
author = {Zan, Yanjun and Carlborg, Örjan},
month = may,
year = {2020},
keywords = {Epistasis, Genetic loci, Genetic networks, Genetic polymorphism, Genetics, Interaction networks, Population genetics, Quantitative trait loci},
pages = {e1008801},
}
@article{yang_haplotype_2020,
title = {Haplotype {Purging} after {Relaxation} of {Selection} in {Lines} of {Chickens} {That} {Had} {Undergone} {Long}-{Term} {Selection} for {High} and {Low} {Body} {Weight}},
volume = {11},
copyright = {http://creativecommons.org/licenses/by/3.0/},
issn = {2073-4425},
url = {https://www.mdpi.com/2073-4425/11/6/630},
doi = {10.3390/genes11060630},
abstract = {Bi-directional selection for increased and decreased 56-day body weights (BW56) has been applied to two lines of White Plymouth Rock chickens—the Virginia high (HWS) and low (LWS) body weight lines. Correlated responses have been observed, including negative effects on traits related to fitness. Here, we use high and low body weight as proxies for fitness. On a genome-wide level, relaxed lines (HWR, LWR) bred from HWS and LWS purged some genetic variants in the selected lines. Whole-genome re-sequencing was here used to identify individual loci where alleles that accumulated during directional selection were purged when selection was relaxed. In total, 11 loci with significant purging signals were identified, five in the low (LW) and six in the high (HW) body weight lineages. Associations between purged haplotypes in these loci and BW56 were tested in an advanced intercross line (AIL). Two loci with purging signals and haplotype associations to BW56 are particularly interesting for further functional characterization, one locus on chromosome 6 in the LW covering the sour-taste receptor gene PKD2L1, a functional candidate gene for the decreased appetite observed in the LWS and a locus on chromosome 20 in the HW containing a skeletal muscle hypertrophy gene, DNTTIP1.},
language = {en},
number = {6},
urldate = {2026-05-19},
journal = {Genes},
publisher = {Multidisciplinary Digital Publishing Institute},
author = {Yang, Yunzhou and Zan, Yanjun and Honaker, Christa F. and Siegel, Paul B. and Carlborg, Örjan},
month = jun,
year = {2020},
keywords = {Virginia chicken lines, advanced intercross line, body weight, directional selection, domestication, haplotype, purging, relaxed selection},
pages = {630},
}
@article{zan_polygenic_2019,
title = {A {Polygenic} {Genetic} {Architecture} of {Flowering} {Time} in the {Worldwide} {Arabidopsis} thaliana {Population}},
volume = {36},
issn = {0737-4038},
url = {https://doi.org/10.1093/molbev/msy203},
doi = {10.1093/molbev/msy203},
abstract = {Here, we report an empirical study of the polygenic basis underlying the evolution of complex traits. Flowering time variation measured at 10 and 16°C in the 1,001-genomes Arabidopsis thaliana collection of natural accessions were used as a model. The polygenic architecture of flowering time was defined as the 48 loci that were significantly associated with flowering time—at 10 and/or 16°C and/or their difference—in this population. Contributions from alleles at flowering time associated loci to global and local adaptation were explored by evaluating their distribution across genetically and geographically defined subpopulations across the native range of the species. The dynamics in the genetic architecture of flowering time in response to temperature was evaluated by estimating how the effects of these loci on flowering changed with growth temperature. Overall, the genetic basis of flowering time was stable—about 2/3 of the flowering time loci had similar effects at 10°C and 16°C—but many loci were involved in gene by temperature interactions. Globally present alleles, mostly of moderate effect, contributed to the differences in flowering times between the subpopulations via subtle changes in allele frequencies. More extreme local adaptations were, on several occasions, due to regional alleles with relatively large effects, and their linkage disequilibrium-patterns suggest coevolution of functionally connected alleles within local populations. Overall, these findings provide a significant contribution to our understanding about the possible modes of global and local evolution of a complex adaptive trait in A. thaliana.},
number = {1},
urldate = {2026-05-19},
journal = {Molecular Biology and Evolution},
author = {Zan, Yanjun and Carlborg, Örjan},
month = jan,
year = {2019},
pages = {141--154},
}
@article{guo_genomic_2019,
title = {A genomic inference of the {White} {Plymouth} {Rock} genealogy},
volume = {98},
issn = {0032-5791},
url = {https://www.sciencedirect.com/science/article/pii/S0032579119457283},
doi = {10.3382/ps/pez411},
abstract = {Crossing of populations has been, and still is, a central component in domestication and breed and variety formation. It is a way for breeders to utilize heterosis and to introduce new genetic variation into existing plant and livestock populations. During the mid-19th century, several chicken breeds that had been introduced to America from Europe and Asia became the founders for those formed in the USA. Historical records about the genealogy of these populations are often unclear and inconsistent. Here, we used genomics in an attempt to describe the ancestry of the White Plymouth Rock (WPR) chicken. In total, 150 chickens from the WPR and 8 other stocks that historical records suggested contributed to its formation were whole-genome re-sequenced. The admixture analyses of the autosomal and sex chromosomes showed that the WPR was likely founded as a cross between a paternal lineage that was primarily Dominique, and a maternal lineage where Black Java and Cochin contributed in essentially equal proportions. These results were consistent and provided quantification with the historical records that they were the main contributors to the WPR. The genomic analyses also revealed genome-wide contributions ({\textless}10\% each) by Brahma, Langshan, and Black Minorca. When viewed on an individual chromosomal basis, contributions varied considerably among stocks.},
number = {11},
urldate = {2026-05-19},
journal = {Poultry Science},
author = {Guo, Y. and Lillie, M. and Zan, Y. and Beranger, J. and Martin, A. and Honaker, C. F. and Siegel, P. B. and Carlborg, Ö.},
month = nov,
year = {2019},
keywords = {admixture, ancestry, chickens, domestication, phenotype–genotype interface},
pages = {5272--5280},
}
@article{zhang_genome-wide_2019,
title = {Genome-wide association studies revealed candidate genes for tail fat deposition and body size in the {Hulun} {Buir} sheep},
volume = {136},
copyright = {© 2019 Blackwell Verlag GmbH},
issn = {1439-0388},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/jbg.12402},
doi = {10.1111/jbg.12402},
abstract = {Fat-tailed sheep have a unique characteristic of depositing fat in their tails. In the present study, we conducted genome-wide association studies (GWAS) on traits related to tail fat deposition and body size in the Hulun Buir sheep. A total number of 300 individuals belonging to two fat-tailed lines of the Hulun Buir sheep breed genotyped with the Ovine Infinium HD SNP BeadChip were included in the current study. Two mixed models, one for continuous and one for binary phenotypic traits, were employed to analyse ten traits, that is, body length (BL), body height (BH), chest girth (CG), tail length (TL), tail width (TW), tail circumference (TC), carcass weight (CW), tail fat weight (TF), ratio of CW to TF (RCT) and tail type (TT). We identified 7, 6, 7, 2, 10 and 1 SNPs significantly associated with traits TF, CW, RCT, TW, TT and CG, respectively. Their associated genomic regions harboured 42 positional candidate genes. Out of them, 13 candidate genes including SMURF2, FBF1, DTNBP1, SETD7 and RBM11 have been associated with fat metabolism in sheep. The RBM11 gene has already been identified in a previous study on signatures of selection in this specific sheep population. Two more genes, that is, SMARCA5 and GAB1 were associated with body size in sheep. The present study has identified candidate genes that might be implicated in tail fat deposition and body size in sheep.},
language = {en},
number = {5},
urldate = {2026-05-19},
journal = {Journal of Animal Breeding and Genetics},
author = {Zhang, Tongyu and Gao, Hongding and Sahana, Goutam and Zan, Yanjun and Fan, Hongying and Liu, Jiaxin and Shi, Liangyu and Wang, Hongwei and Du, Lixin and Wang, Lixian and Zhao, Fuping},
year = {2019},
note = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jbg.12402},
keywords = {Hulun Buir sheep, candidate genes, fat deposition, genome-wide association study, tail type},
pages = {362--370},
}
@article{zan_genotyping_2019,
title = {Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach},
volume = {51},
issn = {1297-9686},
shorttitle = {Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders},
url = {https://doi.org/10.1186/s12711-019-0487-1},
doi = {10.1186/s12711-019-0487-1},
abstract = {Experimental intercrosses between outbred founder populations are powerful resources for mapping loci that contribute to complex traits i.e. quantitative trait loci (QTL). Here, we present an approach and its accompanying software for high-resolution reconstruction of founder mosaic genotypes in the intercross offspring from such populations using whole-genome high-coverage sequence data on founder individuals ({\textasciitilde} 30×) and very low-coverage sequence data on intercross individuals ({\textless} 0.5×). Sets of founder-line informative markers were selected for each full-sib family and used to infer the founder mosaic genotypes of the intercross individuals. The application of this approach and the quality of the estimated genome-wide genotypes are illustrated in a large F2 pedigree between two divergently selected lines of chickens.},
language = {en},
number = {1},
urldate = {2026-05-19},
journal = {Genetics Selection Evolution},
author = {Zan, Yanjun and Payen, Thibaut and Lillie, Mette and Honaker, Christa F. and Siegel, Paul B. and Carlborg, Örjan},
month = aug,
year = {2019},
pages = {44},
}
@article{zan_multilocus_2018,
title = {A multilocus association analysis method integrating phenotype and expression data reveals multiple novel associations to flowering time variation in wild-collected {Arabidopsis} thaliana},
volume = {18},
copyright = {© 2018 John Wiley \& Sons Ltd},
issn = {1755-0998},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.12757},
doi = {10.1111/1755-0998.12757},
abstract = {The adaptation to a new habitat often results in a confounding between genomewide genotype and beneficial alleles. When the confounding is strong, or the allelic effects is weak, it is a major statistical challenge to detect the adaptive polymorphisms. We describe a novel approach to dissect polygenic traits in natural populations. First, candidate adaptive loci are identified by screening for loci directly associated with the adaptive trait or the expression of genes known to affect it. Then, a multilocus genetic architecture is inferred using a backward elimination association analysis across all candidate loci with an adaptive false discovery rate-based threshold. Effects of population stratification are controlled by accounting for genomic kinship in both steps of the analysis and also by simultaneously testing all candidate loci in the multilocus model. We illustrate the method by exploring the polygenic basis of an important adaptive trait, flowering time in Arabidopsis thaliana, using public data from the 1,001 genomes project. We revealed associations between 33 (29) loci and flowering time at 10 (16)°C in this collection of natural accessions, where standard genomewide association analysis methods detected five (3) loci. The 33 (29) loci explained approximately 55.1 (48.7)\% of the total phenotypic variance of the respective traits. Our work illustrates how the genetic basis of highly polygenic adaptive traits in natural populations can be explored in much greater detail using new multilocus mapping approaches taking advantage of prior biological information, genome and transcriptome data.},
language = {en},
number = {4},
urldate = {2026-05-19},
journal = {Molecular Ecology Resources},
author = {Zan, Yanjun and Carlborg, Örjan},
year = {2018},
note = {\_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/1755-0998.12757},
keywords = {Arabidopsis thaliana, expression QTL, flowering, genomewide association analysis, polygenic},
pages = {798--808},
}
@article{zan_relationship_2018,
title = {On the {Relationship} {Between} {High}-{Order} {Linkage} {Disequilibrium} and {Epistasis}},
volume = {8},
issn = {2160-1836},
url = {https://doi.org/10.1534/g3.118.200513},
doi = {10.1534/g3.118.200513},
abstract = {A plausible explanation for statistical epistasis revealed in genome wide association analyses is the presence of high order linkage disequilibrium (LD) between the genotyped markers tested for interactions and unobserved functional polymorphisms. Based on findings in experimental data, it has been suggested that high order LD might be a common explanation for statistical epistasis inferred between local polymorphisms in the same genomic region. Here, we empirically evaluate how prevalent high order LD is between local, as well as distal, polymorphisms in the genome. This could provide insights into whether we should account for this when interpreting results from genome wide scans for statistical epistasis. An extensive and strong genome wide high order LD was revealed between pairs of markers on the high density 250k SNP-chip and individual markers revealed by whole genome sequencing in the Arabidopsis thaliana 1001-genomes collection. The high order LD was found to be more prevalent in smaller populations, but present also in samples including several hundred individuals. An empirical example illustrates that high order LD might be an even greater challenge in cases when the genetic architecture is more complex than the common assumption of bi-allelic loci. The example shows how significant statistical epistasis is detected for a pair of markers in high order LD with a complex multi allelic locus. Overall, our study illustrates the importance of considering also other explanations than functional genetic interactions when genome wide statistical epistasis is detected, in particular when the results are obtained in small populations of inbred individuals.},
number = {8},
urldate = {2026-05-19},
journal = {G3 Genes{\textbar}Genomes{\textbar}Genetics},
author = {Zan, Yanjun and Forsberg, Simon K G and Carlborg, Örjan},
month = aug,
year = {2018},
pages = {2817--2824},
}
@article{zan_artificial_2017,
title = {Artificial {Selection} {Response} due to {Polygenic} {Adaptation} from a {Multilocus}, {Multiallelic} {Genetic} {Architecture}},
volume = {34},
issn = {0737-4038},
url = {https://doi.org/10.1093/molbev/msx194},
doi = {10.1093/molbev/msx194},
abstract = {The ability of a population to adapt to changes in their living conditions, whether in nature or captivity, often depends on polymorphisms in multiple genes across the genome. In-depth studies of such polygenic adaptations are difficult in natural populations, but can be approached using the resources provided by artificial selection experiments. Here, we dissect the genetic mechanisms involved in long-term selection responses of the Virginia chicken lines, populations that after 40 generations of divergent selection for 56-day body weight display a 9-fold difference in the selected trait. In the F15 generation of an intercross between the divergent lines, 20 loci explained \>60\% of the additive genetic variance for the selected trait. We focused particularly on fine-mapping seven major QTL that replicated in this population and found that only two fine-mapped to single, bi-allelic loci; the other five contained linked loci, multiple alleles or were epistatic. This detailed dissection of the polygenic adaptations in the Virginia lines provides a deeper understanding of the range of different genome-wide mechanisms that have been involved in these long-term selection responses. The results illustrate that the genetic architecture of a highly polygenic trait can involve a broad range of genetic mechanisms, and that this can be the case even in a small population bred from founders with limited genetic diversity.},
number = {10},
urldate = {2026-05-19},
journal = {Molecular Biology and Evolution},
author = {Zan, Yanjun and Sheng, Zheya and Lillie, Mette and Rönnegård, Lars and Honaker, Christa F. and Siegel, Paul B. and Carlborg, Örjan},
month = oct,
year = {2017},
pages = {2678--2689},
}
@article{wang_bivariate_2017,
title = {Bivariate genomic analysis identifies a hidden locus associated with bacteria hypersensitive response in {Arabidopsis} thaliana},
volume = {7},
copyright = {2017 The Author(s)},
issn = {2045-2322},
url = {https://www.nature.com/articles/srep45281},
doi = {10.1038/srep45281},
abstract = {Multi-phenotype analysis has drawn increasing attention to high-throughput genomic studies, whereas only a few applications have justified the use of multivariate techniques. We applied a recently developed multi-trait analysis method on a small set of bacteria hypersensitive response phenotypes and identified a single novel locus missed by conventional single-trait genome-wide association studies. The detected locus harbors a minor allele that elevates the risk of leaf collapse response to the injection of avrRpm1-modified Pseudomonas syringae (P = 1.66e-08). Candidate gene AT3G32930 with in the detected region and its co-expressed genes showed significantly reduced expression after P. syringae interference. Our results again emphasize that multi-trait analysis should not be neglected in association studies, as the power of specific multi-trait genotype-phenotype maps might only be tractable when jointly considering multiple phenotypes.},
language = {en},
number = {1},
urldate = {2026-05-19},
journal = {Scientific Reports},
publisher = {Nature Publishing Group},
author = {Wang, Biao and Li, Zhuocheng and Xu, Weilin and Feng, Xiao and Wan, Qianhui and Zan, Yanjun and Sheng, Sitong and Shen, Xia},
month = mar,
year = {2017},
keywords = {Plant genetics, Quantitative trait},
pages = {45281},
}
@article{zan_genetic_2016,
title = {Genetic {Regulation} of {Transcriptional} {Variation} in {Natural} {Arabidopsis} thaliana {Accessions}},
volume = {6},
issn = {2160-1836},
url = {https://doi.org/10.1534/g3.116.030874},
doi = {10.1534/g3.116.030874},
abstract = {An increased knowledge of the genetic regulation of expression in Arabidopsis thaliana is likely to provide important insights about the basis of the plant’s extensive phenotypic variation. Here, we reanalyzed two publicly available datasets with genome-wide data on genetic and transcript variation in large collections of natural A. thaliana accessions. Transcripts from more than half of all genes were detected in the leaves of all accessions, and from nearly all annotated genes in at least one accession. Thousands of genes had high transcript levels in some accessions, but no transcripts at all in others, and this pattern was correlated with the genome-wide genotype. In total, 2669 eQTL were mapped in the largest population, and 717 of them were replicated in the other population. A total of 646 cis-eQTL-regulated genes that lacked detectable transcripts in some accessions was found, and for 159 of these we identified one, or several, common structural variants in the populations that were shown to be likely contributors to the lack of detectable RNA transcripts for these genes. This study thus provides new insights into the overall genetic regulation of global gene expression diversity in the leaf of natural A. thaliana accessions. Further, it also shows that strong cis-acting polymorphisms, many of which are likely to be structural variations, make important contributions to the transcriptional variation in the worldwide A. thaliana population.},
number = {8},
urldate = {2026-05-19},
journal = {G3 Genes{\textbar}Genomes{\textbar}Genetics},
author = {Zan, Yanjun and Shen, Xia and Forsberg, Simon K G and Carlborg, Örjan},
month = aug,
year = {2016},
pages = {2319--2328},
}
@article{zan_genome-wide_2013,
title = {Genome-wide identification, characterization and expression analysis of populusleucine-rich repeat receptor-like protein kinase genes},
volume = {14},
issn = {1471-2164},
url = {https://doi.org/10.1186/1471-2164-14-318},
doi = {10.1186/1471-2164-14-318},
abstract = {Leucine-rich repeat receptor-like kinases (LRR-RLKs) comprise the largest group within the receptor-like kinase (RLK) superfamily in plants. This gene family plays critical and diverse roles in plant growth, development and stress response. Although the LRR-RLK families in Arabidopsis and rice have been previously analyzed, no comprehensive studies have been performed on this gene family in tree species.},
language = {en},
number = {1},
urldate = {2026-05-19},
journal = {BMC Genomics},
author = {Zan, Yanjun and Ji, Yan and Zhang, Yu and Yang, Shaohui and Song, Yingjin and Wang, Jiehua},
month = may,
year = {2013},
keywords = {Expression profiling, Leucine-rich repeat receptor-like kinase (LRR-RLK), Motif elicitation, Phylogenetic analysis, Populus trichocarpa},
pages = {318},
}

