Skip to content

Latest commit

 

History

History
61 lines (43 loc) · 1.95 KB

svm_on_angel_en.md

File metadata and controls

61 lines (43 loc) · 1.95 KB

Support Vector Machine (SVM)

SVM is used for classification and regression analysis.

1. Introduction

SVM solves the following optimization problem:

where is the regularization term; is the regularization coefficient; is the hinge loss as visualized below:

2. Distributed Implementation on Angel

Angel MLLib uses mini-batch gradient descent optimization method for solving SVM's objective; the algorithm is shown below:

3. Execution and Performance

Input Format

  • Data fromat is set in "ml.data.type", supporting "libsvm" and "dummy" types. For details, see Angel Data Format

  • Feature vector's dimension is set in "ml.feature.num"

Parameters

  • Algorithm Parameters

    • ml.epoch.num: number of epochs
    • ml.batch.sample.ratio: sampling rate for each epoch
    • ml.num.update.per.epoch: number of mini-batches in each epoch
    • ml.data.validate.ratio: proportion of data used for validation, no validation when set to 0
    • ml.learn.rate: initial learning rate
    • ml.learn.decay: decay rate of the learning rate
    • ml.svm.reg.l2: coefficient of the L2 penalty
  • I/O Parameters

    • angel.train.data.path: input path for train
    • angel.predict.data.path: input path for predict
    • ml.feature.num: number of features
    • ml.data.type: Angel Data Format, supporting "dummy" and "libsvm"
    • angel.save.model.path: save path for trained model
    • angel.predict.out.path: output path for predict
    • angel.log.path: save path for the log
  • Resource Parameters

    • angel.workergroup.number: number of workers
    • angel.worker.memory.mb: worker's memory requested in G
    • angel.worker.task.number: number of tasks on each worker, default is 1
    • angel.ps.number: number of PS
    • angel.ps.memory.mb: PS's memory requested in G

Performance