M2ROC Guideline Page

 

Input

The input of this tool is typically a CSV(Comma-Separated Values) file containing an OTU table.        OTU table


An OTU (operational taxonomic unit) table is a matrix that gives the number of reads per sample per OTU. Each OTU is an operational definition used to classify groups of closely related individuals.With an OTU table, techniques such as machine learning, feature selection and network analysis can be applied to better understand the relations between closely related individuals and identify each indivisual more accurately.

As an example, we sampled from Iris flower data set. In figure below, each row is a sample in 15 sample we selected containing three different classes. Each columns (except the last column representing class label) is an attribute. For iris data set, the four attributes are sepal length, sepal width, petal length and petal width (in cm) accordingly. Your input CSV file should follow this sample format.

 Notice: Most of errors during execution of the web server are caused by submitting data with the wrong format. If you are running into execution errors. Please first check your input. Notice that the input should be in CSV(Comma-Separated Values) format.

 

Procedure

1.Matrix to AUC:

We take the matrix(OTU) provided in the input file, rank the attributes with five attribute ranking methods (Gain Ratio, Information Gain, mRMR, Relief and System Uncertainty), and measure the performance of each method using classifiers(SVM, Naive Bayes and Random Forest) with with the leave-one-out cross-validation (LOOCV) strategy. The AUC figure created in this step shows the AUC values of different number of top attributes chosen. So the user can choose a fairly optimal amount of attributes for further processing.       

2. Matrix to ROC:

This step is identical to step 1 with selected attributes in the Matrix. The amount of attributes chosen is decided by the user. After step one, you will be asked to provide the number of attributes you would like to include based on the figure. After ranking, the top N (indicated by user) features will be used for further processing.

 

 

If you choose to Average the results, a ROC for each attribute ranking methods will present in the figure based on the number of features chosen. There will be 5 ROCs in one figure.

If you don’t average the results, there will 5 figures, each representing a feature selection method.

 

3. Venn Diagram

Each attribute ranking methods has its own algorithm of selecting attributes. By selecting top N attributes. Different feature ranking methods usually give different chosen attribute sets.

By creating a Venn Diagram of five different attribute sets. We can easily visualize the number of features that are common selected by different attribute selection methods. The details of the sets and shared subsets can be downloaded with the link provided along the Venn diagram.

 

 

4. Bar