Genome assembly and annotation of P. strobi, a story of genome size expansion driven by repeats

Photo Credit: Justin G. A. Whitehill, North Carolina State University, USA

Beetles (Coleoptera) are the largest order of insects representing more than 400,000 extant species. Amongst the beetles, Curculionidae (or, “true weevils”) are a heterogeneous taxon with more than 60,000 species that include some of the world’s most devastating forest and agricultural pests. The weevils belong to the group of Phytophaga and their diversity has been ascribed to the coevolution with plants. Their successful coevolution has generated a “cascade of evolutionary innovations” which has been attributed to the trophic interaction with both plant hosts and microorganisms.

The spruce weevil (Pissodes strobi) is a highly destructive pest of North American conifers, requiring millions of dollars annually for screening and monitoring programs. In western North America its primary hosts are Sitka (Picea sitchensis), white (P. glauca), Engelmann (P. engelmannii), and hybrid (P. glauca x engelmannii x sitchensis) spruces. The spruce weevil annual life cycle can be divided into two major phases, the exophase and endophase. During exophase, adult spruce weevils live on the outside of the tree and feed on its bark without causing substantial damage to the host. Endophase takes place after the female deposits its eggs into oviposition holes at the tip of the apical shoot. Inside of the tree, larvae feeding disrupts the flow of water and nutrients and leads to apical shoot mortality. The spruce weevil is most destructive during endophase, which continues until pupation and emergence from the tree as an adult. Damage from spruce weevil larvae results in stunted and deformed growth, and repeated infestation can result in tree death. The identification of genetic resistance is the primary means to manage spruce weevil in a forest landscape. Spruce resistance to weevil is exerted through a combination of cortical stone cells (i.e. highly lignified cell types) that synergize the effects of oleoresin terpenes to provide a robust defense syndrome that inhibits larvae survival.

Wolbachia spp. are the most prevalent endosymbiotic bacterial group, associating with over 60% of insect species. Given its close contact with host reproductive tissues, the presence of Wolbachia spp. plays a crucial role in host development and reproduction. Wolbachia spp. is often transmitted vertically, directly from the germ line to the offspring; horizontal gene transfer also spreads inter-specifically between insect species. Wolbachia spp. plays an active part in its host evolution through involvement with its reproduction process and facilitating rapid gene selection.

We have assembled the nuclear and mitochondrial genomes of the spruce weevil from a single pupa, sequenced with 10x Chromium linked-reads technology at 53-fold coverage. We also present the genome assembly and annotation of the Wolbachia sp. extractedassembled from from the spruce weevil shortlinked- reads, a putative endosymbiont that generally forms parasitic relationships with its host. To data, eight other Curculionidae genomes have been released, with five of those being reported in the past year: the coffee borer beetle, Hypothenemus hampei; the Argentine stem weevil, Listronotus bonariensis; red palm weevil, Rhynchophorus ferrugineus; oil palm pollinating weevil, Elaeidobius kamerunicus; mountain pine beetle, Dendroctonus ponderosae; the Easter Egg weevil, Pachyrhynchus sulphureomaculatus; the Eurasian spruce bark beetle, Ips typographus; and the rice weevil, Sitophilus oryzae.

Most of the Curculionidae genomes, excluding the Easter Egg and Argentine stem weevils (~2Gbp and 1.1 Gbp respectively), available to date have a compact genome that spans a few hundreds of Mbp. A comparative genomics analysis highlighted the relatively large genome of the spruce weevil. Agreement between experimental and in silico analyses supports a nuclear spruce weevil genome size of ~2 Gbp. Interestingly, the genome was annotated with a large abundance of Transposable Elements (TE), which positively correlates with genome size in arthropods and may have possibly driven the genome size evolution. Another interesting aspect of the spruce weevil genome relates to the timing of TE expansion and the identification of active classes. The Kimura repeats landscape highlights a recent expansion event that could have dramatically reshaped the genome structure of the spruce weevil. Most of the TE copies show a recent divergence which indicates a repeat replication and turnover rate when compared to other Curculionidae. Both DNA and LTR type of repeats show similar divergence estimates in the spruce weevil, which indicates a related turn-over event. Finally, the “Unknown” type of TEs that cover about 6% of the total genome, have a steady expansion also culminating with a recent peak.

The putative Wolbachia spp. endosymbiont nearly complete assembled genome consists of ~1,2 Mbp, with a significantly large number of complete annotated genes (1,588). A phylogenetic analysis, based on single copy orthologs, groups the Wolbachia sp. putative endosymbiont in the supergroup A, characterized by a reproductive parasitism. The diversity of species in supergroup A indicates that closely related Wolbachia strains are found in a diverse array of host species. Further studies and targeted experiments can give more information about the specific mechanisms involved in this process.

Compared to existing genomic resources for closely related species, the spruce weevil genome is complex, repeat-rich, and significantly larger than other sequenced pests. Future work should focus on the size expansion of genomeof the genome and explore how populations of spruce weevil that are geographically separated compare. Given the wide host range of spruce weevil coupled to its large geographic footprint, a comprehensive analysis of individuals adapted to different hosts and regions could reveal clues related to the size expansion within this Curculionid species.