Skip to contents

This function is used in the CompareSim and CompareSample functions. It splits the data between train an test datasets, and remove the taxonomic information from the test dataset.

Usage

SampleTestDataset(dat, pc2fill, pcGenusDet, pcFamilyDet)

Arguments

dat

The data to split

pc2fill

the percentage of data (dat) to fill

pcGenusDet

the percentage of data determined at the genus level (from the subset of dataFill to fill, the rest isn't determined at all).

pcFamilyDet

the percentage of data determined at the family level (from the subset of dataFill to fill).

Value

This function returns a list with 3 elements:

  • The dataset with some taxonomis information removed

  • The taxonomic information that has been removed

  • The corresponding tree ID (trees from the test dataset)

Details

This function

  • Split between train and test set according to the parameter pc2fill (keep only fully identified trees in the test set),

  • Remove taxonomic information from the test set (at the species, genus, or family level according to the parameters pcFamilyDet and pcGenusDet),

@export