PhD Student Bastian Schiffthaler (Photo: Alena Aliashkevich)
Most plant and animal features arise from complex interactions of genes, proteins and metabolites. The identification and analysis of these complex genetic traits is very challenging, especially when the sequenced genomes are fragmented. Bastian Schiffthaler, PhD student in Nathanial Street’s group, improved the genome information from European aspen and developed bioinformatic tools that help to analyse complex genetic traits in plants. He has successfully defended his PhD thesis at Umeå University today, on the 12th of June.
For sequencing a genome, the DNA is normally cut into small pieces, the sequence is read and then bioinformatic software assembles the whole sequence information using overlapping regions of these small pieces in an iterative process that ideally yields full length chromosomes. For trees, which often have very complex genomes and most available genome assemblies are therefore not very contiguous. Bastian Schiffthaler worked on improving the contiguity of such genomes focussing on European aspen.
When Bastian Schiffthaler started, the genome sequence of European aspen was already quite good compared to for example Norway spruce. However, it was still fragmented which made it difficult to carry out analyses that depend on a highly contiguous assembly. Examples of this are the detection of DNA signatures that relate to traits via genome wide association, or studying evolutionary history by looking at large scale genomic rearrangements. “Our strategy included modern long read sequencing, polished with highly accurate short-read data and combined with an optical and a genetic map to further link the initially assembled scaffolds into fully assembled chromosomes. At close to 20,000 genetic markers, the genetic map is one of the most comprehensive ones created for any organism to date. This was an overwhelming mass of information that most of the commonly used free software programmes were not able to handle.”
Ordering markers on a genetic map is a classic application of the travelling salesman problem, which aims to find the shortest between a set of points or locations. To derive the perfect order for only sixty markers would take more calculations than are atoms in the universe, hence all software relies on approximations, but even those were too slow for a dataset of this size. To overcome this problem, Bastian Schiffthaler developed “BatchMap”, a software package that speeds up the computations required to find the order of genetic markers with the highest likelihood given their inheritance patterns.
“BatchMap” divides calculations into small batches, which are easy to compute and can run in parallel. This drastically decreased the calculation time and Bastian Schiffthaler could produce a dense map of genetic signatures on the European aspen chromosomes. Since the creation of BatchMap, it has now been adopted by other genome projects such as those assembling the Norway spruce or a strawberry genome, which comprises eight chromosome sets.
“We wanted to evaluate our improved assembly in the context of genome wide association studies to look for genes that are involved in the salicinoid metabolism. These metabolites are only available in Populus and Salix species and help to protect the plant against herbivores,” explains Bastian Schiffthaler. “When compared to previous attempts using the more fragmented assembly, we could see that our new genome version improved the analysis of this complex trait a lot and we were able to gain new insights into the evolution of the different Populus species.”
To identify genes that are controlling complex traits is very challenging. Bastian Schiffthaler and his colleagues studied leaf shape variation in European aspen, a complex trait that is inherited from the parents but still highly diverse between individuals. Their results show that leaf shape is controlled by a complex network of many different genes, but the individual gene often exerted only a minor influence on the final leaf shape.
Bastian Schiffthaler believes that it in order to better understand the workings of traits like leaf shape, an integrative approach, where traits are analysed at all stages that contribute to their emergence. He therefore developed “Seidr”, a toolkit to study the interactions of genes that are actively being made into protein within an organism. He hopes that integrating “Seidr” with other layers of data will enable scientists to better predict complex traits in the future.
About the public defence:
The public defence took place on Friday, 12th of June at Umeå University. The faculty opponent was Marek Mutwil from The School of Biological Sciences at Nanyang Technological University in Singapore, who participated remotely in the defence. Supervisor was Nathaniel Street. The defence was broadcasted live. Interested people could participate via Zoom.
Title of the thesis: Embracing the Data Flood – Integrating Diverse Data to Improve Phenotype Association Discovery in Forest Trees.
Link to the thesis in DIVA: http://umu.diva-portal.org/smash/record.jsf?pid=diva2%3A1429905&dswid=8163
For more information, please contact
Department of Plant Physiology
Umeå Plant Science Centre