This function is used in the CompareSim and CompareSample functions.
It splits the data between train an test datasets, and remove the taxonomic information from the test dataset.
Usage
SampleTestDataset(dat, pc2fill, pcGenusDet, pcFamilyDet)
Arguments
- dat
The data to split
- pc2fill
the percentage of data (dat) to fill
- pcGenusDet
the percentage of data determined at
the genus level (from the subset of dataFill to fill, the rest isn't determined at all).
- pcFamilyDet
the percentage of data determined
at the family level (from the subset of dataFill to fill).
Value
This function returns a list with 3 elements:
The dataset with some taxonomis information removed
The taxonomic information that has been removed
The corresponding tree ID (trees from the test dataset)
Details
This function
Split between train and test set according to the parameter pc2fill (keep only fully identified trees in the test set),
Remove taxonomic information from the test set (at the species, genus, or family level according to the parameters pcFamilyDet and pcGenusDet),
@export