Skip to contents

This function simulates Nsim fully determined communities by attributing a full botanical names to all trees with a vernacular name that were not fully determined or for which the BotaSource=Vern, using a categorical-Dirichlet association Scheme. The reason for replacing trees with BotaSource=Vern is because in these cases, the botanical name has been obtained from the vernacular name using the more likely association in the Guyafor database. As our objective is to assign a botanical name to trees with only a vernacular name using a probability of association, we don't want to keep the botanical name when BotaSource=Vern.

Usage

SimFullCom(
  Data2fill,
  DataAsso = NULL,
  prior = NULL,
  wp = 0.5,
  NSim,
  eps = 0.01,
  Determ = NULL
)

Arguments

Data2fill

data.table of the dataset for which the gap filling of botanical names from vernacular names will be done. This dataset must have been prepared using the function PrepData.

DataAsso

data.table of the dataset that will be used to built the association matrix, formatted as shown in the vignette. This dataset must have been prepared using the function PrepData. (Default is NULL: the dataset Data2fill is used to built the association matrix)

prior

data.frame of expert knowledge association used as a prior. This dataset must have been prepared using the function PrepPrior. (Default is NULL: no prior information is used)

wp

numeric value giving the weighting of the prior information (Default is 0.5).

NSim

positive integer: number of simulated communities that we want to obtain

eps

epsilon: background noise for species not associated with a given vernacular name. Default is 0.01.

Determ

boolean: if TRUE the more likely botanical names are return when a association vernacular-botanical is performed. If FALSE, the botanical names are drawn using a categorical-Dirichlet association Scheme. If NSim i set to 1, a value needs to be provided for Determ. If NSim is set to more than 1, default is FALSE.

Value

This function returns a list of NSim data.tables, each one being the original data with two additional columns:

  • GensSpCor: The Genus and species after gap filling

  • BotaCorCode : the type of correction

    • fullyDet: tree with a fully determined name => no correction.

    • Det2Genus: identified to the genus => the corrected name is of the form Genus-Indet. (so we keep the botanically identified genus name and say that the species is Indet.)

    • Det2Fam: identified to the family => the corrected name is of the form Family-Indet. (so we keep the botanically identified Family name and say that the species is Indet.)

    • NoCor: no correction is made => the corrected name is Indet.-Indet.

    • AssoByGenus, AssoByGenusDeterm and AssoByGenusDetermT: a full identification is given with the method of association with the vernacular name using a Dirichlet-Categorical scheme (if Determ is set to FALSE) or with the more likely association (if Determ is set to TRUE), limiting the possibility to the species of the same genus

    • AssoByFam, AssoByFamDeterm and AssoByFamDetermT: a full identification is given with the method of association with the vernacular name using a Dirichlet-Categorical scheme described (if Determ is set to FALSE) or with the more likely association (if Determ is set to TRUE), limiting the possibility to the species of the same family

    • AssoByVerna, AssoByVerna and AssoByVernaT: full identification is given with the method of association with the vernacular name using a Dirichlet-Categorical scheme described (if Determ is set to FALSE) or with the more likely association (if Determ is set to TRUE) For the three last cases, if Determ is set to TRUE and there are more than one species having the maximum likelihood, a random drawn is done between these species and a T is added at the end of the code (for tie).

Details

This function performs the following steps:

  • get a data.table containing the matrix of posterior Alpha (using function CreateAlpha) from prior knowledge in the dataset prior updated with observation of the dataset DataAsso, using a Dirichlet-Categorical scheme

  • get Nsim fully determined community using the function Get1Sim Nsim time