Perform simulation and compare different ways of sampling the data to fill
Source:R/CompareSample.R
CompareSample.Rd
This function performs several simulations like CompareSim, but with different samples of the dataset used as test.
Usage
CompareSample(
NbSamples = 3,
Param = NULL,
priors = NULL,
D2fill,
DAsso = NULL,
pc2fill = NULL,
pcFamilyDet = NULL,
pcGenusDet = NULL,
NbSim = 1,
Results_Simulations = FALSE,
parallel = FALSE
)
Arguments
- NbSamples
an integer: the number of test dataset sampling to compare. All other parameters are the same parameters as for the function CompareSim.
- Param
a datatable with the parameters for each scenario :
priors : a vector with the rank of the priors to use in the priors list
dataAsso : a vector with the rank of the observation data to use in the DAsso list (if Dasso is not provided, put 1)
weights : a vector with the weights of the priors
eps : a vector with the epsilon value for each scenario
Determ : a vector with the value of Determ (boolean)
- priors
a list of datasets containing the priors for each scenario. These datasets must have been prepared using the function PrepPrior. (Default is NULL: no prior information is used).
- D2fill
a dataset to fill, this dataset must have been prepared using the function PrepData.
- DAsso
a list of datasets of observation used for each scenario, these datasets must have been prepared using the function PrepData (Default is NULL: the dataset Data2fill is used to built the association matrix).
- pc2fill
the percentage of data (dataFill) to fill.
- pcFamilyDet
the percentage of data determined at the family level (from the subset of dataFill to fill). We recommend using a percentage equivalent than in the data to gapfill.
- pcGenusDet
the percentage of data determined at the genus level (from the subset of dataFill to fill, the rest isn't determined at all). We recommend using a percentage equivalent than in the data to gapfill.
- NbSim
the number of simulations.
- Results_Simulations
a boolean specifying if the user wants to keep the results of the simulations
- parallel
a boolean specifying if the user wants to speed up the loop by using parallelization.
Value
This function returns a list of 2 objects:
a list of objects of the class VernaBotaSims, resulting from the simulations
a datasets with one line per simulation : accuracy, scenario, and sampled test dataset
Details
This function performs the following steps NbSamples times:
Get the data,
Split between train and test set and remove taxonomic information from the test set (see SampleTestDataset function),
for each scenario: perform simulations (see SimFullCom function), compare simulations with original taxonomic information, with the function CompareTaxo, create an object of the class VernaBotaSims