[Presentation/2017] Plant Genome Era: Understanding plant genomes from the sequences
Genome, defined as a blend of gene and chromosome, represents whole nucleotide sequences of one organism. Arabidopsis thaliana genome (119.15 Mbp) was fully sequenced by Sanger sequencing method as the first plant genome in 2000. After this genome, two rice (Oryza sativa subsp. japonica and O. sativa subsp. indica) genomes were released (374.47 Mbp and 395.82 Mbp, respectively). Consequently, tree genomes, Populus trichocapa (434.13 Mbp) and Vitis vinifera (486.27 Mbp), were also successfully sequenced. In 2009, cucumber (Cucumis sativus; 243.57 Mbp) genome was sequenced and assembled with the aid of one of next generation sequencing (NGS) technologies (Illumina), which decreased sequencing cost dramatically. It probed that plant genome can also be analyzed with the low cost. Currently, more than 100 plant genomes have been sequenced; however, there is no central repository for plant genome sequences. Even NCBI does not cover all published plant genomes now. In addition, four genomes of tomato have been sequenced for de novo assembly; there is no assembled sequences available in public. These problems are critical huddle for understanding plant genomes in various aspects. In addition, many re-sequencing projects have been conducted; however, available datasets for these projects are usually raw data and bam file. A standardized plant genome database ( has been established to overcome these problems systematically. Currently, 147 plant genomes (95 species) were collected from diverse sources including NCBI, Phytozome, Ensembl, and many independent plant databases. Total length of 147 genomes is 141.35 Gb (976Mb on average) and total number of ORFs is 5,691,580 from 136 plant genomes. One hundred and twelve species consists of 12 green algal species, three mosses, one fern, three Gymnosperm species, and 99 Angiosperm species. 23 orders of Angiosperm have sequenced genomes: Brassicales has 32, Poales contains 26, Rosales includes 14 genomes, respectively. The smallest genome except green algal species is Capsella grandiflora (105.35Mb) and the largest one is Pinus lambertiana (34.08 Gb), which is 323 times than that of C. grandiflora. Total size of three sequenced Gymnosperm genomes is 72.20 Gb (55.37\%). 12 out of 118 genomes (10.17\%; called as large plant genomes) are more than 1 Gb long, indicating that large size plant genomes are still difficult to sequence. As expected, there is no significant correlation between genome length and number of ORFs; while GC ratio of plant genomes is correlated with taxonomy. For example, GC ratio of 10 green algal genomes shows from 52.90\% to 67.14\%; while that of 30 monocot genomes (Poales and Alismatales) are from 40.49\% to 46.89\% and 94 dicot genomes displays 32.31\% to 40.26\%. Throughout these analyses, 141.35 Gb plants genome sequence is not just collection of nucleotides but new indicators to understand characteristics of plants based on taxonomy.
2018-09-09
Jongsun Park
