Mock Community Generation

By Hand

mock

diversity

Creation of test communities

Published

December 21, 2021

Background

In order to make the differences between the metrics presented on this site more apparent, it would be helpful to have a standard mock community that can be used.

Here, I will make two sets of 3 communities - one set that differs in sequencing depth, and another that differs in evenness.

True community with known abundances

First, I will define a true population that mimics a typical microbial community population that has a “long tail” distribution. Our community will have 20 total members, 10 of which are above 1% abundance, and the other 10 are below 1% in abundance.

asv_probabilities = c(0.3, 0.2, 0.1, rep(0.05, 3), rep(0.025, 4), rep(0.001, 5), rep(0.0001, 5))
names(asv_probabilities) <- paste("ASV", 1:length(asv_probabilities), sep = "")

Each species is named with “ASV” plus a number that increases with decreasing abundance. Don’t worry if you are not familiar with ASVs are what they mean. For now, just think of them as simply the microbial species names.

Different sequencing depths (mock community #1)

Starting with this distribution of the true community, let’s take 3 samples at depths of 10,000, 1,000 and 100. This is done with replace=TRUE to simulate a very (infinitely?) large population.

Sample_1 = table(sample(x=names(asv_probabilities), size=10000, replace=TRUE, prob=asv_probabilities))
Sample_2 = table(sample(x=names(asv_probabilities), size=1000, replace=TRUE, prob=asv_probabilities)) 
Sample_3 = table(sample(x=names(asv_probabilities), size=100, replace=TRUE, prob=asv_probabilities))

Different diversity/evenness (mock community #2)

Next, we will take these same 20 species and define different true proportions to sample from. These next three communities will have decreasing evenness from 1) all 20 perfectly even, 2) ten species with 10x more abundance than the other 10, and 3) two species with 100x more abundance than the other 18 species.

Then, each will be sampled 10,000 times to give us the same sequencing effort.

Sample_4 = table(sample(x=paste("ASV", 1:20, sep = ""), size=10000, replace=TRUE, prob=c(rep(1,1), rep(1,19))))
Sample_5 = table(sample(x=paste("ASV", 1:20, sep = ""), size=10000, replace=TRUE, prob=c(rep(10,10), rep(1,10))))
Sample_6 = table(sample(x=paste("ASV", 1:20, sep = ""), size=10000, replace=TRUE, prob=c(rep(100,2), rep(1,18))))

Combining into one dataset

Now, these 6 samples can be made into an “ASV table” with samples in columns, and species names in rows.

d = list(Sample_1, Sample_2, Sample_3, Sample_4, Sample_5, Sample_6)
names(d) = c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5", "Sample_6")

mock_community = bind_rows(d, .id = "SAMPLE") %>%
    pivot_longer(cols = c(everything(), -SAMPLE), names_to = "ASV") %>% 
    mutate(value = as.numeric(value), value = ifelse(is.na(value), 0, value)) %>% 
    pivot_wider(names_from = SAMPLE, values_from = value)  %>%
    mutate(ASV = fct_relevel(ASV, gtools::mixedsort(unique(as.character(.$ASV))))) %>% 
    arrange(ASV)

ASV	Different Depth			Different Evenness
Same Evenness, Differing Depth or Same Depth, Differing Evenness
ASV	Sample_1	Sample_2	Sample_3	Sample_4	Sample_5	Sample_6
ASV1	3589	334	36	503	934	4584
ASV2	2337	214	20	536	897	4594
ASV3	1134	129	14	504	926	42
ASV4	559	68	10	498	875	47
ASV5	533	63	4	480	873	56
ASV6	624	63	6	513	848	40
ASV7	268	33	2	531	955	50
ASV8	302	27	2	524	926	44
ASV9	300	30	4	494	936	50
ASV10	300	34	2	510	932	44
ASV11	14	0	0	511	92	45
ASV12	8	0	0	480	79	46
ASV13	11	1	0	475	89	42
ASV14	8	0	0	491	85	34
ASV15	6	3	0	492	101	46
ASV16	0	0	0	502	80	52
ASV17	1	0	0	485	87	47
ASV18	1	0	0	502	97	36
ASV19	3	1	0	496	95	44
ASV20	2	0	0	473	93	57

OK, everything looks good. This dataset can now be saved in order to use it in other pages as necessary.