An overview of eDNA related reference libraries

What is an eDNA reference library?

All living things can be identified through genes or DNA barcodes (which are genetic markers), a region of DNA unique to each species. Even though some species might present similar genetic information species can be distinguished by unique DNA sequences  specific to them, like a genetic fingerprint.

Scientist can collect the unique DNA sequence of a species,  creating an ID for each of them. When these IDs are taxonomically identified, collected and added to a curated collection this is what comes to be a DNA reference library. These collections are tools for scientists for comparing collected DNA, allowing them to have an easier and quicker “check” of which species are found in the environment, for example.

Genetic markers are crucial for DNA reference libraries. They are part of the genome (all genetic information/material of an organism), fragments, of DNA that relate specifically to a certain location within a genome allowing the identification of a particular genetic sequence when the specific DNA is unknown. Knowing a genetic marker is like giving special glasses to a scientist to see what organism are out there without actually have seen them -when speaking about eDNA.

Environmental DNA (eDNA) is when DNA is gathered from the environment, for example, through water, soil, or air. These samples contain genetic information from more than one organism or species. Once the sample is collected scientists can compare the different barcodes that are present through a reference library, thereby identifying which species can be found in the sample, thus allowing for the identification of the species present in that ecosystem.      

Source: eDNA reference library. Smithsonian National Museum of Natural History. (n.d.).

An overview of aquatic eDNA related reference libraries

The eDNAqua-Plan project is mapping aquatic eDNA reference libraries. According to its findings most reference libraries store data on multiple DNA barcodes or molecular markers. Nearly half store 16S rRNA data, followed by COI and 18S rRna.

The Barcode of Life Data System (BOLD)

BOLD stores index numbers through its portal to retrieve records of genetic information of all species around the world.

Genetic markers supported

ITS2, rbc, rbcLa, matK, trnH-psbA, trnL-F, atpF-atpH, rpoB, rpoC1, COX1, COX2, COX3, CytB, ND1, ND2, ND3, D-loop, DYN, VDAC, rnL, rbcL-like, ycf1, trnK, atpB-rbcL, PSBA, matK-trnK, petD-intron, PY-IGS, rpL32-trnL, ETS-1, cp-RPS16-intro, 5-8S, COX1-LIKE, CAD, CAD4, PGD, CAD1, TPI, COX1NMT1, 28S-D2-D3, CK1, PER, RBM15, TULP, 28S-D1-D2, ACE2, CQ11, 28S-D3-D5, ND4, ND4L, ND5-0, ND6, 28S rRNA, NBC-COX1, 18S rRNA, 5S-23S, atp6, matR, matK-like, psbK-psbI, 16S rRNA, PsbA-like, COX1-PSEUDO, MutS-like, 16S rRNA-ND2, COX2-COX1, ND4L-MSH, ND6-ND3, ADR, TYR, 28S-D3, Wnt1, 28S-D2, CADH, coxA, ftsZ, sca4, ntcA, PSA, psbC, e-gene, tufA, EF2, 12S rRNA, CHD-Z, MB2-EX2-3, MC1R, COX1-NUMT, H3, IRBP, DBY-EX7-8, RAG2, R35-intron, NGFB, R35, Rho, PKD1, TMO-4C4, PLAGL2, RAG1, S7, atp6-atp8, RNF213, ENC1, MYH6, EF1-alpha, RPB2, ApsaB, COX2-COX2I, RBCL-5P, psbA-3P, RPB1, Beta-tubulin, AOX-fmt, mtSSU, trnD-trnY-trnEA, FL-COX1, CAD-5P, GAPDH, fbpA, IDH, MDH, DDC, ENO, ArgK, RpS2, RpS5, DDX23, WSP, RPL37, EF1-alpha-5P, CsIV, HfIV, COX1NMT2, CYTB-NUMT, H3-NUMT, LWRHO, CHOLC, H4, 28S-D7, 28S-D9-D10, ATP1A, EPRS, TS, ATP8, RpS4-trnS, nucLSU, UPA, Msp4, hcpA.

https://boldsystems.org

MIDORI

The new version of MIDORI, version 2, is a reference library of DNA and amino acid sequences used for taxonomic assignments of Eukaryota mitochondrial DNA sequences. Organisms whose cells have a membrane-bound nucleus, all animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes.

Genetic markers supported

COX1, COX2, COX3, 12S rRNA, 16S rRNA, A6, A8, CytB, ND1, ND2, ND3, ND4, ND4L, ND5, ND6.

MetaZooGene Atlas & Database (MZGdb)

MZGdb is a reference database that summirzes the presence and barcoding status of major marine fauna and flora groups and species reported by geographic regions, oceans, and seas. MZGdb also stores data from any of these taxonomic/geographic subsets, which can help in sequence matching and processing times. 

Genetic markers supported

COX1, 16S rRNA, 18S rRNA, 12S rRNA, 28S rRna.

Phytool

This database focuses on freshwater phytoplankton.

Genetic markers supported

16S rRNA, 18S, rRNA, 23S rRNA, rbcL.

Diat.barcode

This reference library centres itself on diatoms (a type of microalgae) which are major contributors to the productivity of oceans and freshwaters, being ecological indicators of health of aquatic ecosystems.

Genetic markers supported

COX1, 18S rRNA, ITS2, and rbcL molecular markers.

UNITE

A database and sequence management environment cantered on the eukaryotic nuclear ribosomal ITS region.

Genetic markers supported

ITS1, ITS2, ITS3 and ITS4.

SILVA

SILVA is a rRNA database project that stores information for ribosomal RNA sequence data for Bacteria, Achaea and Eukarya.

Genetic markers supported

16S rRNA, 18S rRNA, 28S rRNA.

PR2 primer database

The PR2 primer database is an interactive database of eukaryotic rRNA primers and primer sets for metabarcoding studies compiled from the literature. One of four domains can be selected in the database: Archaea, Bacteria, Eukaryota and Eukaryota:plas (corresponding to Eukaryota plastid sequences).

Genetic markers supported

18S, 23S, 28S, 5S, ITS, ITS1, ITS2.