
{"id":15927,"date":"2024-04-19T15:52:06","date_gmt":"2024-04-19T19:52:06","guid":{"rendered":"http:\/\/132.236.156.160\/cuccap\/?p=15927"},"modified":"2024-08-07T20:32:47","modified_gmt":"2024-08-08T00:32:47","slug":"genomics-and-bioinformatics-team-2024-progress-report","status":"publish","type":"post","link":"http:\/\/132.236.156.160\/cuccap\/2024\/04\/19\/genomics-and-bioinformatics-team-2024-progress-report\/","title":{"rendered":"Genomics and Bioinformatics Team | 2024 Progress Report"},"content":{"rendered":"<p><a href=\"http:\/\/132.236.156.160\/cuccap\/wp-content\/uploads\/sites\/6\/2024\/04\/CucCAP.compiledrep.2024a.pdf#page=12\"><strong>View the Genomics Team progress report including all tables and figures in pages 12 &#8211; 19 of the pdf version of this report.<\/strong><\/a><\/p>\n<h2>Genomics and Bioinformatics Team members:<\/h2>\n<ul>\n<li>Zhangjun Fei (Boyce Thompson Institute)<\/li>\n<li>Shan Wu (Boyce Thompson Institute)<\/li>\n<li>Amnon Levi (USDA, ARS)<\/li>\n<li>Yiqun Weng (USDA, ARS)<\/li>\n<li>Michael Mazourek (Cornell University)<\/li>\n<li>Jim McCreight (USDA, ARS)<\/li>\n<li>Rebecca Grumet (Michigan State University)<\/li>\n<\/ul>\n<h2>Objectives: Develop novel advanced bioinformatic, pan-genome, and genetic mapping tools for cucurbits.<\/h2>\n<h3>1.1. Develop genomic and bioinformatic platforms for cucurbit crops.<\/h3>\n<h3>1.1.1. Development of high-resolution genotyping platforms for cucurbits.<\/h3>\n<p>Genome resequencing of the cucumber (388 accessions) and the squash (207 Cucurbita pepo accessions) core collections has been completed. The average depths of cleaned sequences of cucumber and squash cores are 49.7\u22c5 and 49.9\u22c5, respectively. For melon (384 accessions) and watermelon (372 accessions) cores, genome sequencing of 313 and 301 accessions, respectively, has been completed. In addition, we have also completed genome resequencing for 26 C. maxima and seven C. moschata accessions.<\/p>\n<p>The sequence data of cucumber, squash, melon and watermelon cores have been processed for SNP and small indel calling using the Gy14 genome (v2.1), the MU\u2010CU\u201016 genome (v4.1), the 97103 genome (v2.5) and the DHL92 genome (v4) as the references, respectively. Statistics of called variants are summarized in Table 1. Raw sequencing data and called variants have been distributed to our industry partners who have requested access to the data. Biallelic variants with MAF&gt;0.01 of cucumber and squash core collections are available for mining publicly at CuGenDBv2 (<strong><a href=\"http:\/\/cucurbitgenomics.org\/v2\/genotype\" target=\"_blank\" rel=\"noopener\">cucurbitgenomics.org<\/a><\/strong>). The remaining accessions in the melon and watermelon cores are currently under sample collection and DNA preparation and will be sequenced. Currently, of the remaining 71 accessions in the watermelon core, DNA has been prepared for 45 accessions while the other 26 accessions did not germinate. Variants will be updated for the watermelon and melon cores once new sequences are available.<\/p>\n<h4>Table 1 Summary of genome sequencing of cucurbit core collections<\/h4>\n<p>We recently found that a total of 58 accessions in the cucumber core contain large numbers of missing SNPs (5-35%) due to the poor quality of the sequencing libraries. These libraries were constructed during CucCAP1 using a cheap protocol. Sequencing of these accessions are bein\u00a0redone. DNA has been prepared for 45 accessions, while the remaining 13 accessions did not\u00a0germinate. Variants will be updated with new sequences when available.<\/p>\n<h3>1.1.2. Development of novel, advanced genome and pan-genome platforms for cucurbit species.<\/h3>\n<p>For cucumber, we have selected 25 accessions including five wild Cucumis sativus var. hardwickii,\u00a0four semi-wild Xishuangbanna and 16 cultivated cucumbers for PacBio HiFi sequencing. Ten of\u00a0these 25 accessions are from the core collection. HiFi sequences have been generated for all the\u00a025 accessions, with an average depth of 33.4\u00d7.<\/p>\n<p>For watermelon, we selected a total of 135 accessions for reference-grade genome development,\u00a0including one <em>Citrullus naudinianus<\/em>, one <em>C. rehmii<\/em>, two <em>C. ecirrhosus<\/em>, five <em>C. colocynthis<\/em>, 16 <em>C.\u00a0amarus<\/em>, seven <em>C. mucosospermus<\/em>, five <em>C. lanatus<\/em> var. <em>cordophanus<\/em>, seven landraces, and 82\u00a0cultivars and nine interspecific hybrids. HiFi sequences have been generated for all 135 accessions, with an average depth of 30.3\u00d7.<\/p>\n<p>For melon, a total of 27 representative accessions have been selected for HiFi sequencing,\u00a0including 14 <em>C. melo<\/em> ssp. <em>melo<\/em> and 13 <em>C. melo<\/em> ssp. <em>agrestis<\/em> accessions, among which 13 from\u00a0India\/Pakistan, two from Turkey, three from Americas, and two from Africa, four from\u00a0Central\/West Asia, two from East Asia, and one from Europe. HiFi sequences have been generated for 22 of the 27 accessions, with an average depth of 33.7\u00d7.<\/p>\n<p>For squash, three accessions, two from <em>Cucurbita pepo<\/em> ssp. <em>texana<\/em> (also known as ssp. <em>ovifera<\/em>)\u00a0and one from <em>C. pepo<\/em> ssp. <em>pepo<\/em>, have been selected for HiFi sequencing. HiFi sequences of these\u00a0three accessions have been generated. We have also generated HiFi sequences for <em>C. maxima<\/em> Rimu and <em>C. moschata<\/em> Rifu.<\/p>\n<h3>1.1.3. De novo genome assembly and pan-genome construction<\/h3>\n<p>We have finished the assembling of chromosome-scale genomes of the 25 cucumber accessions.\u00a0 The assembled genome sizes of the 25 accessions range from 259.0 Mb to 302.3 Mb (average:\u00a0 287.43 Mb) and N50 contig sizes from 5.25 Mb to 22.98 Mb (average: 15.46 Mb). BUSCO\u00a0 completeness rate of these genome assemblies ranges from 96.4% to 98.8%, with an average of\u00a0 98.4%. An average of 95.5% of the contigs (ranging from 90.3% to 97.8%) are assigned to the\u00a0 seven cucumber chromosomes. Protein-coding genes have been predicted in these genomes, as\u00a0 well as an additional of 11 previously published chromosome-scale cucumber genomes (seven\u00a0 cultivated, one Xishuangbanna and three wild <em>hardwickii<\/em>). The number of predicted genes ranges\u00a0 from 21,347 to 22551, with an average of 21,870. BUSCO completeness rate of genes predicted\u00a0 from each of these 36 cucumber genome assemblies ranges from 93.0% to 97.0%, with an average\u00a0 of 96.0%. Using the newly assembled WI7631 (\u2018Chinese long\u2019) genome as the\u00a0 reference\/backbone, large structural variants (SVs) have been called and for the other 24\u00a0 assembled genomes and the 11 previously published genomes (<b>Table 2<\/b>). A graph pan-genome has\u00a0 been constructed using the WI7631 genome and the called SVs and used to Genotype these SVs\u00a0 in the core collection using the resequencing short reads.<\/p>\n<p>For watermelon, we have finished chromosome-scale genome assemblies and gene prediction for\u00a0 all 135 accessions. The assembled genome sizes range from 368.6 Mb to 406.7 Mb (average: 377.5\u00a0 Mb) and N50 sizes are all greater than 20 Mb (20.37-35.64 Mb; an average of 30.49 Mb). BUSCO\u00a0 completeness rate of these genome assemblies ranges from 93.9% to 99.2%, with an average of\u00a0 99.0%. An average of 99.2% of the contigs (ranging from 96.2% to 99.9%) are assigned to the 11\u00a0 watermelon chromosomes. The number of predicted protein-coding genes ranges from 20,834 to\u00a0 23,330 (average: 21,785). BUSCO completeness rate of genes predicted from each of these 135\u00a0 watermelon genome assemblies ranges from 91.6% to 96.6%, with an average of 95.9%. Using\u00a0 the newly assembled \u201897103\u2019 genome as the backbone, SVs are being called in the other 134\u00a0 watermelon accessions, as well as three previously published long read assemblies (Table 2). The\u00a0 final SVs and the \u201897103\u2019 genome have been used to construct a Citrullus graph pan-genome,\u00a0 which has been used to genotype these SVs in the core collection and other accessions using the\u00a0 resequencing short reads (a total of 756 accessions, including 436 cultivars, 114 landraces, 13 cordophanus, 39 <em>mucosospermus<\/em>, 120 <em>amarus<\/em>, 33 <em>colocynthis<\/em> and 1 <em>rehmii<\/em>).<\/p>\n<h4>Table 2 Summary statistics of SVs identified in cucumber and watermelon across 36 and 138 genome assemblies, respectively.<\/h4>\n<p>For melon, we have finished the chromosome-level assemblies of 22 accessions. <span style=\"font-weight: 400\">The assembled\u00a0 genome sizes range from 355.7 Mb to 387.0 Mb (average: 371.7 Mb) and N50 contig sizes from\u00a0 9.41 Mb to 19.60 Mb (average: 13.85 Mb). BUSCO completeness rate of these genome assemblies\u00a0 ranges from 93.7% to 97.9%, with an average of 97.3%. An average of 97.2% of the contigs\u00a0 (ranging from 92.4% to 99.5%) are assigned to the 12 melon chromosomes. Protein-coding genes\u00a0 have been predicted in 21 of the 22 assembled genomes, and the number of genes predicted in each\u00a0 genome ranges from 23,108 to 27,678 (average: 24,570). BUSCO completeness rate of genes\u00a0 predicted from each of these 21 melon genomes ranges from 95.5% to 97.6%, with an average of\u00a0 96.6%.<\/span><\/p>\n<p>For <i><span style=\"font-weight: 400\">Cucurbita <\/span><\/i><span style=\"font-weight: 400\">species, we have finished genome assemblies and gene predictions of three squash\u00a0 (<\/span><i><span style=\"font-weight: 400\">C. pepo<\/span><\/i><span style=\"font-weight: 400\">) accessions, and <\/span><i><span style=\"font-weight: 400\">C. maxima <\/span><\/i><span style=\"font-weight: 400\">Rimu and <\/span><i><span style=\"font-weight: 400\">C. moschata <\/span><\/i><span style=\"font-weight: 400\">Rifu (<\/span><b>Table 3<\/b><span style=\"font-weight: 400\">).<\/span><\/p>\n<h4>Table 3 Statistics of Cucurbita genome assemblies.<\/h4>\n<h3>1.1.4. Breeder-friendly web-based database for phenotypic, genotypic and QTL information<\/h3>\n<p>We have updated CuGenDB to version 2 (CuGenDBv2) and officially released CuGenDBv2 in\u00a0 April 2022. CuGenDBv2 currently hosts 34 reference genomes from 27 cucurbit\u00a0 species\/subspecies belonging to 10 different genera. Protein-coding genes from all these 34\u00a0 genomes (total: 919,903; average: 27,056) have been comprehensively annotated, and the\u00a0 annotated genes can be queried and extracted in the database. Genomic synteny blocks and\u00a0 syntenic gene pairs have been identified between any two and within each of the 34 cucurbit\u00a0 genome assemblies (595 pairwise genome comparisons). A total of 391,379 synteny blocks and\u00a0 12,130,719 syntenic gene pairs (average: 31 per synteny block) have been identified between the\u00a0 34 cucurbit genomes. The \u2018Synteny Viewer\u2019 module have been re-implemented in CuGenDBv2\u00a0 to improve the efficiency in processing and displaying the large-scale synteny data.<\/p>\n<p>A \u2018Genotype\u2019 module has been newly developed in CuGenDBv2. The module provides a suite of\u00a0 functions that allow users to mine, analyze, extract, and download variants including SNPs and\u00a0 small indels from large-scale population genome sequencing projects. Currently variants (SNPs\u00a0 and small indels) called for cucumber and squash core collections and watermelon resequencing\u00a0 panel, and SNPs called from the GBS data generated under CucCAP1 for watermelon, melon,\u00a0 cucumber, <i><span style=\"font-weight: 400\">C. pepo<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">C. maxima <\/span><\/i><span style=\"font-weight: 400\">and <\/span><i><span style=\"font-weight: 400\">C. moschata <\/span><\/i><span style=\"font-weight: 400\">are available in the database for query and mining.\u00a0<\/span><\/p>\n<p>The \u2018Expression\u2019 module in CuGenDBv2 has been redesigned to provide a complete cucurbit gene\u00a0 expression atlas, using the publicly available cucurbit RNA-Seq datasets. Currently raw RNA-Seq\u00a0 data of a total of 221 projects, 1,513 distinct samples and 3,560 runs (or libraries) have been\u00a0 downloaded from NCBI and processed to derive expression values, which can be queried in\u00a0 CuGenDBv2 to display expression profiles of specific interesting genes in different tissues,\u00a0 development stages, and under different treatment conditions.<\/p>\n<p>Phenotype data have been generated for melon and cucumber core collections. A total of 33\u00a0 vegetative, flower and fruit characters and two disease resistance traits have been evaluated for the\u00a0 melon core collection, and for the cucumber core collection a combination of 15 external and\u00a0 internal characteristics have been collected for immature and mature fruit of plants grown in 2019\u00a0 and 2021. A tool to display the fruit images of cucumber core accessions has been developed\u00a0 (<strong><a href=\"http:\/\/www.cucurbitgenomics.org\/cgi-bin\/core?pid=P04\" target=\"_blank\" rel=\"noopener\">cucurbitgenomics.org<\/a><\/strong>). Additional tools to visualize and\u00a0 analyze the phenotypic data will be developed in CuGenDBv2.<\/p>\n<h3>1.2 Perform seed multiplication and sequencing analysis of core collections of the four species, provide community resources for genome wide association studies (GWAS).<\/h3>\n<h3>1.2.1. Seed multiplication of core collections<\/h3>\n<p>For cucumber, seed increases of the 388 accessions in the core collection were carried out by five participating seed companies. As of March 2024, seeds for 310 accessions with more than 1000 seeds per accession have been received.<br \/>\nFor watermelon, HM.Clause is increasing the seeds for 293 accessions in the core collection given to them by USDA-ARS. HM.Clause have already shipped to the USDA, ARS, U.S. Vegetable Laboratory S3 seeds of 177 accessions (with about 1,000 seed\/accession) and will ship during 2024 the S3 seeds of the other 116 accessions they committed to increase. S2 seed of additional 39 accessions will be sent by University of Georgia to HM.CLAUSE for increase. During 2024, S2 seeds of additional 167 PIs (mainly Citrullus amarus) will be increased at the USDA, ARS, U.S. Vegetable Laboratory to reach 500 S3 seeds per accession.<br \/>\nThree companies assisted in advancing the melon core set in 2023: 259 of the 384 melon core lines were sent to three seed company cooperators; seed was obtained from 180 of those lines. United Genetics advanced 13 S0 lines to S1 and three S1 lines to S2. Nunhems advanced 13 S0 lines to S1 (Table 4). Sakata advanced 151 S2 lines to S3, with seed counts per line ranging from 21 to 3,100, based on seed weight; only 57 lines produced 1,000 or more S3 seed (Table 5).<\/p>\n<h4>Table 4 Seed multiplication status of melon core<\/h4>\n<h4>Table 5 Estimated number of seeds per S3 Melon core lines (based on seed weight) by Sakata<\/h4>\n<p>For the <em>C. pepo<\/em> squash core increase, we expect to receive the last of the seed this summer. All of the squash core will be increased by a professional nursery, Villa Plants and have robust phytosanitary documentation. One line may have some IP restrictions and may be dropped from the core.<\/p>\n<h3>1.2.2. Population genetics and phenotype-genotype association analysis<\/h3>\n<p>Phylogenies of accessions in the cucumber, melon, squash, and watermelon cores have been inferred using the LD-pruned SNPs at four-fold degenerate sites. The phylogenies of cucumber and melon core accessions are largely consistent with their geographic origins and the phylogeny of watermelon accessions is consistent with their species classifications, while no clear separations were observed for squash accessions related to their geographic origins or improvement status.<br \/>\nPhenotype-genotype association analysis has been performed for the cucumber core. The cucumber core accessions were grown in the field at the Michigan State University Horticulture Teaching and Research Center in 2019-2022. Young and mature fruits were harvested at ~5-7 and 30-40 days post pollination, respectively. The following traits were measured for mature fruit: fruit length, diameter, fruit shape index, carpel number, seed cavity, flesh thickness, hollowness, curvature, tapering, skin color, flesh color and netting; and the following for young fruit: fruit shape index, curvature, tapering, skin color, and spine density. Genome-wide association studies<br \/>\n(GWAS) were performed on these fruit traits using different models including FarmCPU, BLINK, MLMM, and MLM (Fig. 1). Chromosomal locations of the detected significantly associated SNPs are illustrated in Fig. 2. QTLs for some of the traits were closely clustered. For example, SNPs for several highly correlated fruit size and shape traits, including mature fruit length, young fruit shape index, carpel number, and seed cavity size, were closely located on chromosome 1 at ~10 Mb. Multiple external fruit traits were also mapped to the same region on chromosome 1, such as netting, spine density, young fruit color R\/G values. Several significant SNPs identified by GWAS were also in close vicinity (within 1Mb) to prior identified fruit trait QTL and candidate genes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>View the Genomics Team progress report including all tables and figures in pages 12 &#8211; 19 of the pdf version of this report.<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"quote","meta":{"footnotes":""},"categories":[270],"tags":[541,457,187,470,335,460,617],"_links":{"self":[{"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/posts\/15927"}],"collection":[{"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/comments?post=15927"}],"version-history":[{"count":9,"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/posts\/15927\/revisions"}],"predecessor-version":[{"id":16733,"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/posts\/15927\/revisions\/16733"}],"wp:attachment":[{"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/media?parent=15927"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/categories?post=15927"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/132.236.156.160\/cuccap\/wp-json\/wp\/v2\/tags?post=15927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}