Skip to contents

This function performs several simulations like CompareSim, but with different samples of the dataset used as test.

Usage

CompareSample(
  NbSamples = 3,
  Param = NULL,
  priors = NULL,
  D2fill,
  DAsso = NULL,
  pc2fill = NULL,
  pcFamilyDet = NULL,
  pcGenusDet = NULL,
  NbSim = 1,
  Results_Simulations = FALSE,
  parallel = FALSE
)

Arguments

NbSamples

an integer: the number of test dataset sampling to compare. All other parameters are the same parameters as for the function CompareSim.

Param

a datatable with the parameters for each scenario :

  • priors : a vector with the rank of the priors to use in the priors list

  • dataAsso : a vector with the rank of the observation data to use in the DAsso list (if Dasso is not provided, put 1)

  • weights : a vector with the weights of the priors

  • eps : a vector with the epsilon value for each scenario

  • Determ : a vector with the value of Determ (boolean)

priors

a list of datasets containing the priors for each scenario. These datasets must have been prepared using the function PrepPrior. (Default is NULL: no prior information is used).

D2fill

a dataset to fill, this dataset must have been prepared using the function PrepData.

DAsso

a list of datasets of observation used for each scenario, these datasets must have been prepared using the function PrepData (Default is NULL: the dataset Data2fill is used to built the association matrix).

pc2fill

the percentage of data (dataFill) to fill.

pcFamilyDet

the percentage of data determined at the family level (from the subset of dataFill to fill). We recommend using a percentage equivalent than in the data to gapfill.

pcGenusDet

the percentage of data determined at the genus level (from the subset of dataFill to fill, the rest isn't determined at all). We recommend using a percentage equivalent than in the data to gapfill.

NbSim

the number of simulations.

Results_Simulations

a boolean specifying if the user wants to keep the results of the simulations

parallel

a boolean specifying if the user wants to speed up the loop by using parallelization.

Value

This function returns a list of 2 objects:

  • a list of objects of the class VernaBotaSims, resulting from the simulations

  • a datasets with one line per simulation : accuracy, scenario, and sampled test dataset

Details

This function performs the following steps NbSamples times:

  • Get the data,

  • Split between train and test set and remove taxonomic information from the test set (see SampleTestDataset function),

  • for each scenario: perform simulations (see SimFullCom function), compare simulations with original taxonomic information, with the function CompareTaxo, create an object of the class VernaBotaSims