The Species Sensitivity Distribution (SSD) approach is a central tool in environmental risk assessment to define safe levels for contaminants within a set of several species. It is based on the assumption that species sensitivity to a given contaminant can be described by a probability distribution estimated from a set of several toxicity values, previously obtained from either MOSAIC\(_{surv}\) or MOSAIC\(_{repro}\). MOSAIC\(_{SSD}\) enables any user to perform a simple yet statistically sound SSD analysis including censored (unbounded or part of an interval) toxicity values, without worrying about the conceptually difficult underlying statistical questions [1]. MOSAIC\(_{SSD}\) provides the so-called hazardous concentration for p% of the species (\(HC_p\)). All calculations are based on the companion R package fitdistrplus
[2].
Within MOSAIC\(_{SSD}\), once the data set uploaded, the user can choose among the log-normal and log-logistic distribution laws to be fitted. The value of the likelihood function for each distribution is provided on the result page and can be used as a further decision criterion (the highest the likelihood value, the most appropriate the distribution). The log-logistic distribution has heavier tails than the log-normal and is therefore generally more conservative in the determination of the 5% hazardous concentration (\(HC_5\)).
After clicking Run
, the 95% bootstrap confidence intervals are automatically computed. They yield confidence intervals on the parameters of the chosen distribution and on several computed \(HC_p\). Calculating the confidence intervals using a bootstrap method has the advantage of using a unified framework for every distribution. As the bootstrap procedure does not necessarily converge depending on the size of the data set, an automatic check of bootstrap convergence is implemented [1].
When using MOSAIC\(_{SSD}\), the first step is to upload input data.
You can upload your own data (click on Load from a file
) by taking care about the format specification of your file.
The expected data for an SSD analysis are a set of toxicity values of a given contaminant estimated for several species. You must upload your data as tabular text files, one line corresponding to one species. The exact syntax of the lines differs if you are dealing with point wise data (toxicity value known without uncertainty) or censored data (left and/or right bounded toxicity values). In any case, only positive values are accepted.
Pointwise data: the file must contain one positive value per line, as in the following example:
1.45
2.31
0.56
Censored data: each line must contain two values (a lower bound and an upper bound), separated by a TAB
character; please note that the TAB
character can be replaced by a comma or spaces for convenience. Missing bounds must be denoted with NA
. If one toxicity value is known as a point wise value, enter it twice, with the same value as the lower and the upper bounds, as in the following example:
1.45 1.85
2.31 NA
NA 0.99
1.11 1.11
In order to try MOSAIC\(_{SSD}\) with an example data set, choose the tab menu Try with an example
. Choose Fluazinam
to get the same results as in this tutorial; these data are censored and correspond 48-hour acute \(EC_{50}\) values for exposure of macro-invertebrates and zooplankton to fluazinam. Then, click Run
.
After choosing one (or two) probability distribution(s) and clicking on Run
, you immediately get the estimated distribution(s): dotted green and plain red curves correspond to the log-normal and log-logistic fitted distributions, respectively. The stepwise curve corresponds to the Turnbull estimate of the cumulative distribution function of the input censored toxicity data. At this stage, only point wise estimates of distribution parameters and \(HC_p\) are made available.
After a while, bootstrap confidence intervals are provided for both parameter and \(HC_p\) estimates:
As with the other modules, MOSAIC\(_{SSD}\) provides the R script allowing to perform further calculations directly within the R software [3].
# To use this script, it is recommended to consult the reference manual of the
# fitdistrplus package http://cran.r-project.org/web/packages/fitdistrplus/fitdistrplus.pdf
library(fitdistrplus)
library(actuar)
start_arg <- function(distname,data) {
lcens<-data[is.na(data$left), ]$right
rcens<-data[is.na(data$right), ]$left
ncens<-data[data$left==data$right & !is.na(data$left) & !is.na(data$right), ]$left
icens<-data[data$left!=data$right & !is.na(data$left) & !is.na(data$right), ]
data<-c(rcens, lcens, ncens, (icens$left+icens$right)/2)
if (distname == 'llogis' ) {
data <- log(data)
n <- length(data)
m <- mean(data)
v <- (n - 1)/n*var(data)
scale <- sqrt(3*v)/pi
c('scale'=exp(m), shape=1/scale)
}
else NULL
}
data <- data.frame(left = c(3.8,33.6,87.,1700.,640.,1155.,113.,129.,586.,1856.,1.6,4.8,82.,155.), right = c(3.8,33.6,87.,NA,640.,NA,113.,129.,586.,NA,1.6,4.8,82.,155.))
ft_lnorm <- fitdistcens(data,'lnorm',start=start_arg('lnorm',data))
summary(ft_lnorm)
ft_llogis <- fitdistcens(data,'llogis',start=start_arg('llogis',data))
summary(ft_llogis)
cdfcompcens(list(ft_lnorm,ft_llogis),xlogscale=TRUE,legendtext=list('lnorm','llogis'),xlab='Concentration in log scale',ylab='Potentially affected fraction')
bt_lnorm <- bootdistcens(ft_lnorm,niter=5000)
quantile(bt_lnorm,probs=c(0.05,0.1,0.2,0.5))
bt_llogis <- bootdistcens(ft_llogis,niter=5000)
quantile(bt_llogis,probs=c(0.05,0.1,0.2,0.5))
[1] Kon Kam King G, Veber P, Charles S, Delignette-Muller ML. 2014. MOSAIC_SSD: a new web tool for species sensitivity distribution to include censored data by maximum likelihood. Environ. Toxicol. Chem. 33:2133–9.
[2] Delignette-Muller ML, Dutang C. 2015. fitdistrplus : An R Package for Fitting Distributions. J. Stat. Softw. 64:1–34.
[3] R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.