Supplementary data: A comparison of six DNA extraction protocols for 16S, ITS, and shotgun metagenomic sequencing of microbial communities
Table S1. Mantel correlations in sample-sample distances between each candidate extraction kit and our standardized protocol, for bacterial/archaeal 16S sequence data. Data were rarefied to the maximum read depth that maintained 75% of samples, or had samples with fewer than that number of reads excluded when using RPCA distances (i.e., high biomass samples: 12,690 reads; low biomass samples: 3,295 reads).
Table S2. Mantel correlations in sample-sample distances between each candidate extraction kit and our standardized protocol, for fungal ITS sequence data. Data were rarefied to the maximum read depth that maintained 50% of samples, or had samples with fewer than that number of reads excluded when using RPCA distances (i.e., high biomass samples: 1,491 reads; low biomass samples: 344 reads).
Table S3. Mantel correlations in sample-sample distances between each candidate extraction kit and our standardized protocol, for bacterial/archaeal shotgun metagenomic sequence data. Data were rarefied to the maximum read depth that maintained 75% of samples, or had samples with fewer than that number of reads excluded when using RPCA distances (i.e., high biomass samples: 38,000 reads; low biomass samples: 600 reads).
Figure S1. (A) Average concentration of DNA (ng/μL) across extraction protocols for each sample type (n = 1,184 samples). Red circles indicate group means. A miniaturized, high-throughput Quant-iT PicoGreen dsDNA assay was used, with a lower limit of 0.1 ng/μL indicated by the horizontal, dotted gray line in each panel. Yields below this value were estimated by extrapolating from a standard curve. (B) Average number of quality-filtered sequences for 16S data (n = 1,039 samples). Dashed lines indicate our expectation of 10,000 reads from human fecal samples. For both panels, red circles indicate means, and vertical gray lines separate different sequencing runs. As sampling effort was not normalized here, such to maintain absolute values, comparisons should not be made across sequencing runs.
Figure S2. Sequences per sample across extraction protocols and sample types. (A) Average number of quality-filtered sequences for fungal ITS data (n = 991 samples). (B) Average number of host- and quality-filtered sequences for bacterial/archaeal metagenomic data (n = 1,037 samples). Dashed lines indicate our expectation of 1,000,000 reads from human fecal samples. For both panels, red circles indicate means, and vertical gray lines separate different sequencing runs. As sampling effort was not normalized here, such to maintain absolute read counts, comparisons should not be made across sequencing runs.
Figure S3. Within-sample variation across extraction kits, for bacterial/archaeal 16S data. Microbial community beta-diversity among replicate extractions of the same source sample was estimated using (A) Jaccard distance, (B) RPCA distance, (C) unweighted UniFrac distance, and (D) weighted UniFrac distance. Data were rarefied to the maximum read depth that maintained 75% of samples, or had samples with fewer than that number of reads excluded when using RPCA distances (i.e., high biomass samples: 12,690 reads; low biomass samples: 3,295 reads).
Figure S4. Within-sample variation across extraction kits, for fungal ITS data. Fungal community beta-diversity among replicate extractions of the same source sample was estimated using (A) Jaccard distance, and (B) RPCA distance. Data were rarefied to the maximum read depth that maintained 50% of samples, or had samples with fewer than that number of reads excluded when using RPCA distances (i.e., high biomass samples: 1,491 reads; low biomass samples: 344 reads).
Figure S5. Within-sample variation across extraction kits, for bacterial/archaeal shotgun metagenomic sequence data. Microbial community beta-diversity among replicate extractions of the same source sample was estimated using (A) Jaccard distance, (B) RPCA distance, (C) unweighted UniFrac distance, and (D) weighted UniFrac distance. Data were rarefied to the maximum read depth that maintained 75% of samples, or had samples with fewer than that number of reads excluded when using RPCA distances (i.e., high biomass samples: 38,000 reads; low biomass samples: 600 reads).