Richness or Observed Species (S)

By Hand

How many unique bugs are in there?

Published

December 22, 2021

Richness \(\textcolor{#037bcf}{S}\) is a measure of alpha-diversity, and is perhaps the easiest to calculate and comprehend. \(\textcolor{#037bcf}{S}\) is simply the count of the number of species with an abundance > 0. \(\textcolor{#037bcf}{S}\) is a per-sample metric (as opposed to per-species) in that a single value is obtained describing the entire sample.

Simple Calculation

We can easily calculate the richness of the 6 mock communities generated previously. As a reminder, Samples 1-3 had the same underlying community structure but were sampled to different depths, and Samples 4-6 had a decrease in evenness but were sampled to the same depth.

load(here("static", "data", "mock_community.rds"))

S = mock_community %>%
  pivot_longer(cols = -ASV, names_to = "SAMPLE", values_to = "ABUNDANCE") %>%
  group_by(SAMPLE) %>%
  filter(ABUNDANCE > 0) %>%
  tally(name = "S")

Sample	S
Sample_1	19
Sample_2	13
Sample_3	10
Sample_4	20
Sample_5	20
Sample_6	20

Notice how although samples 1-3 were all sampled from the same true population, we miss the more rare species simply by not sequencing to enough depth to capture them and \(\textcolor{#037bcf}{S}\) goes down. We saw similar results in the Jaccard analysis of these communities.

The more samples you have, the more you increase the chance that a species found in one sample is missing from another. Or, the larger the range of sequencing effort, the greater chance species found in the deeply sequenced samples are missing from the shallow sequenced samples.

For Samples 4-6, they were all sampled to a sufficient depth so all 20 species were found. That is about all that \(\textcolor{#037bcf}{S}\) can tell us, however, and we do not see the distribution differences here.

Advanced

There are more complicated ways to calculate \(\textcolor{#037bcf}{S}\) that attempt to use more rigorous statistics. The most popular of these are Chao¹ and ACE², and are both implemented in the vegan package in the estimateR() function (try ?estimateR for more information). These calculate \(\textcolor{#037bcf}{S}\) as well as the standard error (denoted with “se”). S.obs is the “simple” richness calculated as we did above.

V = mock_community %>%
  column_to_rownames("ASV") %>%
  t() %>%
  vegan::estimateR()

Sample	S.obs	S.chao1	se.chao1	S.ACE	se.ACE
Sample_1	19	19.5	1.287887	20.35391	2.103822
Sample_2	13	14.0	2.283481	16.66667	1.178569
Sample_3	10	10.0	0.000000	10.00000	1.449138
Sample_4	20	20.0	0.000000	NaN	NaN
Sample_5	20	20.0	0.000000	NaN	NaN
Sample_6	20	20.0	0.000000	NaN	NaN

For Samples 1-3 both methods did a fairly good job. We will look a little more closely at this method (as well as calculate it “by hand”) below.

ACE doesn’t deal well when there are no singletons/doubletons (or even ASVs found less than 10 times³), and that is why it failed to return a value for Samples 4-6.

Look into ChaoSpecies in the SpadeR package for more methods. Our sample dataset it too tiny to get meaningful results from these powerful tools, and they work better on real data. There is a Chao2 metric which uses presence/absence, but it doesn’t seem to pop-up much anywhere.

Chao1

Of all the values obtained in this advanced section, only the Chao1 is reasonably simple to do by hand. In the equation below, F1 and F2 are the counts of singletons and doubletons, respectively.

\[\textcolor{#037bcf}{S_{Chao1} = S_{Obs} + {f_1(f_1 - 1)\over2(f_2 + 1)}}\]

C = mock_community %>%
  pivot_longer(cols = c(everything(), -ASV), names_to = "SAMPLE", values_to = "ABUNDANCE") %>%
  mutate(S = case_when(  
    ABUNDANCE >  0 ~ 1)) %>%
  mutate(F1 = case_when(  
    ABUNDANCE == 1 ~ 1)) %>%
  mutate(F2 = case_when(  
    ABUNDANCE == 2 ~ 1)) %>%
  group_by(SAMPLE) %>%
  summarise(S = sum(S, na.rm = T),
            F1 = sum(F1, na.rm = T), 
            F2= sum(F2, na.rm = T)) %>%
  mutate(CHAO1 = S + ((F1 * (F1 - 1)) / (2 * (F2 + 1))))

Sample	S	F1	F2	CHAO1
Sample_1	19	2	1	19.5
Sample_2	13	2	0	14.0
Sample_3	10	0	3	10.0
Sample_4	20	0	0	20.0
Sample_5	20	0	0	20.0
Sample_6	20	0	0	20.0

Looking at the equation above, it is obvious why singletons are so important to these calculations. If there are no singletons, then the fraction equates to \(0\) and all you get is \(\textcolor{#037bcf}{S_{Chao1} = S_{Obs}}\).

Additionally, we see an overestimation in Sample_2 because of the high singleton but low doubleton count. This makes

\[\textcolor{#037bcf}{S_{Chao1} = 14 + {4(4 - 1)\over2(0 + 1)}}\]

which simplifies to

\[\textcolor{#037bcf}{S_{Chao1} = 14 + \frac{12}{2} = 20}\]

and is almost a 50% overestimation!

Richness or Observed Species (S)

Simple Calculation

Advanced

Chao1

Further Reading

Footnotes