adaSvmBenchmark {AdaSampling} | R Documentation |

`adaSvmBenchmark()`

allows a comparison between the performance
of an AdaSampling-enhanced SVM (support vector machine)-
classifier against the SVM-classifier on its
own. It requires a matrix of features (extracted from a labelled dataset),
and two vectors of true labels and labels with noise added as desired.
It runs an SVM classifier and returns a matrix which displays the specificity
(Sp), sensitivity (Se) and F1 score for each of four conditions:
"Original" (classifying with true labels), "Baseline" (classifying with
noisy labels), "AdaSingle" (classifying using AdaSampling) and
"AdaEnsemble" (classifying using AdaSampling in conjunction with
an ensemble of models).

adaSvmBenchmark(data.mat, data.cls, data.cls.truth, cvSeed, C = 50, sampleFactor = 1)

`data.mat` |
a rectangular matrix or data frame that can be coerced to a matrix, containing the features of the dataset, without class labels. Rownames (possibly containing unique identifiers) will be ignored. |

`data.cls` |
a numeric vector containing class labels for the dataset
with added noise.
Must be in the same order and of the same length as |

`data.cls.truth` |
a numeric vector of true class labels for
the dataset. Must be the same order and of the same length as |

`cvSeed` |
sets the seed for cross-validation. |

`C` |
sets how many times to run the classifier, for the AdaEnsemble condition. See Description above. |

`sampleFactor` |
provides a control on the sample size for resampling. |

AdaSampling is an adaptive sampling-based noise reduction method
to deal with noisy class labelled data, which acts as a wrapper for
traditional classifiers, such as support vector machines,
k-nearest neighbours, logistic regression, and linear discriminant
analysis. For more details see `?adaSample()`

.

This function runs evaluates the AdaSampling procedure by adding noise
to a labelled dataset, and then running support vector machines on
the original and the noisy dataset. Note that this function is for
benchmarking AdaSampling performance using what is assumed to be
a well-labelled dataset. In order to run AdaSampling on a noisy dataset,
please see `adaSample()`

.

performance matrix

Yang, P., Liu, W., Yang. J. (2017) Positive unlabeled learning via wrapper-based
adaptive sampling. *International Joint Conferences on Artificial Intelligence (IJCAI)*, 3272-3279

Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J.(2018)
AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications.
*IEEE Transactions on Cybernetics*, doi:10.1109/TCYB.2018.2816984

# Load the example dataset data(brca) head(brca) # First, clean up the dataset to transform into the required format. brca.mat <- apply(X = brca[,-10], MARGIN = 2, FUN = as.numeric) brca.cls <- sapply(X = brca$cla, FUN = function(x) {ifelse(x == "malignant", 1, 0)}) rownames(brca.mat) <- paste("p", 1:nrow(brca.mat), sep="_") # Introduce 40% noise to positive class and 30% noise to the negative class set.seed(1) pos <- which(brca.cls == 1) neg <- which(brca.cls == 0) brca.cls.noisy <- brca.cls brca.cls.noisy[sample(pos, floor(length(pos) * 0.4))] <- 0 brca.cls.noisy[sample(neg, floor(length(neg) * 0.3))] <- 1 # benchmark classification performance with different approaches adaSvmBenchmark(data.mat = brca.mat, data.cls = brca.cls.noisy, data.cls.truth = brca.cls, cvSeed=1)

[Package *AdaSampling* version 1.3 Index]