Function Annotation and Browse

Verified Alleles

RNA-seq Expression
Download FPKM file | Download Sample file
Chen, JD., Zheng, C., Ma, JQ. et al. The chromosome-scale genome reveals the evolution and diversification after the recent tetraploidization event in tea plant. Hortic Res 7, 63 (2020).
Jiang, X., Zhao, H., Guo, F. et al. Transcriptomic analysis reveals mechanism of light-sensitive albinism in tea plant Camellia sinensis Huangjinju. BMC Plant Biol 20, 216 (2020).
Li, CF., Zhu, Y., Yu, Y. et al. Global transcriptome and gene regulation network for secondary metabolite biosynthesis of tea plant (Camellia sinensis). BMC Genomics 16, 560 (2015).
Wang, X., Feng, H., Chang, Y. et al. Population sequencing enhances understanding of tea plant evolution. Nat Commun 11, 4447 (2020).
Wang, XC., Zhao, QY., Ma, CL. et al. Global transcriptome profiles of Camellia sinensis during cold acclimation. BMC Genomics 14, 415 (2013).
Xia, E.-H., Zhang, H.-B., Sheng, J., Li, K., Zhang, Q.-J., Kim, C., Zhang,Y., Liu, Y., Zhu, T., and Li, W. (2017). The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 10:866-877.
YK10

Method

Genome assembly:

The genome assembly process involved several steps. First, we performed automatic assembly using CANU (Version: v1.8) with PacBio data. The assembled contigs were then polished using Pilon (Version 1.23) with NGS sequencing data. Next, we used the HERA software with specific parameters for further assembly. Scaffolding of the contigs was performed using SAPHYR optical mapping technology and the Solve software package. Redundancy in the genome was resolved using the Redundans.py software. The resulting sequences were clustered using Hi-C data and the 3d-dna pipeline. Manual review and refinement of the assembly were performed using Juicebox Assembly Tools. Finally, the genome was reassembled using 3d-dna, resulting in 15 anchored chromosomes.

Repeats annotation:

Tandem repeats in the genome were identified using Tandem Repeat Finder (TRF). Transposable elements (TEs) were identified using a combination of homology-based and de novo approaches. Homology-based prediction involved searching for known repeats using RepeatMasker and RepeatProteinMask against Repbase. De novo prediction utilized LTR FINDER, ltrharvest, LTR_retriever, and RepeatModeler. The identified repeats were classified using TEsorter based on the REXdb database.

Gene prediction and functional annotation:

Gene prediction was performed using EVidence Modeler (EVM). RNA-seq data, protein alignments, ab initio gene predictions, and homologous methods were combined using EVM to generate the final gene set. Training data for ab initio gene predictors were generated using PASA and various tools such as AUGUSTUS, GlimmerHMM, GENSCAN, and SNAP. Homology-based gene annotation utilized protein sequences from related species. The gene functions were assigned based on the best match alignment using eggNOG-mapper against the eggNOG5.0 database.

RNA-seq

These raw reads of RNA-seq were stored in fastq format, and processed through Trimmomatic (Version 0.32). This step removed reads containing adapter, reads containing poly-N and low-quality reads from the raw data and yielded clean data for downstream analyses. The corresponding trimmed clean reads were aligned to the related reference genome employing TopHat2 software with default settings. Calculations of gene expression level were conducted using Cufflinks v2.2.1. Fragments per kilobase of exon per million fragments mapped (FPKM) was used to normalize RNA-seq fragment counts and estimate the relative abundance of each gene. The DEGs were decided based on a P-value < 0.05 and at least a 2-fold change between the two FPKMs.