Vowpal Wabbit is a data mining tool which is able to learn from terafeature datasets with ease. Via parallel learning, it can exceed the throughput of any single machine network interface when doing linear learning, a first amongst learning algorithms. It has several optimization and classification algorithms implemented, for example – Support Vector Machine (SVM), Online Stochastic Gradient Descent, Logistic Regression etc. VW also provides an active learning interface through which one can simulate the active learning algorithm or deploy active learning.

Installation

There are two possible ways to install the Vowpal Wabbit –

  • Using Package Manager : In Ubuntu 14.04, Vowpal Wabbit package already exists in the package manager. You can install it using the Synaptic pakcage manager or just run the following command in the terminal
    sudo apt-get install vowpal-wabbit
    ** note that the existing version in the repository is usually an older version.
  • Install it from source : Follow the step by step introduction for building the source files from here.

Active Learning

You can look into the active learning algorithm which is implemented in VW from this paper – Importance Weighted Active Learning.
To run the classifications or active learning simulation you can look at the command line arguments from here.

Evaluation

The classification results of vowpal wabbit is not like other classification tools. To get the confusion matrix for the classification results calculated through vowpal wabbit you can use the following python code:

import sys,csv
import numpy as np
from sklearn.metrics import accuracy_score, 
confusion_matrix, classification_report, 
precision_score, recall_score, f1_score

def main(y_pred_file,y_true_file):

    ## Create list for predicted class
    #remove last line if using csoaa.
    lines = [line.strip() for line in open(y_pred_file)]
    y_pred = [float(item) for item in lines]
    # y_pred = y_pred[:-1] 

    ## Create list for true class
    lines = [line.strip() for line in open(y_true_file)]
    y_true = [float(item[0]) for item in lines]
    
    ## Define labels for classes
    target_names = ['sitting', 'sittingdown', 'standing', 'standingup', 'walking']

    ## Compute evaluation metrics
    acc = accuracy_score(y_true, y_pred)
    cm = confusion_matrix(y_true,y_pred)
    # cr = classification_report(y_true,y_pred,target_names=target_names)

    ## Print results
    print '%-10s %12.5f' % ('accuracy:',acc)
    print 'confusion matrix:'
    print cm
    # print cr

if __name__=='__main__':
    sys.exit(main(sys.argv[1],sys.argv[2]))</pre>