Skip to contents

This function is used to perform the MSMICA algorithm for metabolite identification using the metabolomics feature table and the KEGG database.

Usage

MSMICA_algorithm(
  met_raw_wide,
  class_file = NULL,
  LC = "HILIC",
  LC_run_time,
  mz_threshold = 10,
  biospecimen = "Blood",
  hmdb_detection_preference = TRUE,
  All_Adduct = c("M+H", "M+Na", "M+2Na-H", "M+H-H2O", "M+H-NH3", "M+ACN+H", "M+ACN+2H",
    "2M+H", "M+2H", "M+H-2H2O"),
  metabolite_database = "KEGG_HMDB",
  reaction_database = c("mammalia"),
  backpropagation_correlation_direction = "positive",
  adduct_correlation_r_threshold = 0.39,
  adduct_correlation_time_threshold = 6,
  isotopic_correlation_r_threshold = 0.71,
  isotopic_correlation_time_threshold = 4,
  imputation_method = "half_min",
  prefix = "",
  ion_mode = "positive",
  detail = FALSE,
  save_unidentified = FALSE,
  progress_log = FALSE
)

Arguments

met_raw_wide

a metabolomics feature table in wide format with mz as the first column, time as the second column, and intensity values as the remaining columns.

class_file

a class file in wide format with the first column name as metabolomics raw file name, the second column name as subject ID, and the third column name as class label (study sample or reference standard sample).

LC

a character value indicating which liquid chromatography (LC) column to be used to predict the retention time of the metabolites. Default is "HILIC" (hydrophilic interaction liquid chromatography). Other options is "RP" or "C18" (reversed phase liquid chromatography, also called C18).

LC_run_time

a numeric value indicating the run time of the liquid chromatography. Default is 5 minutes.

mz_threshold

the m/z threshold for the metabolite identification. Default is 10 ppm.

biospecimen

a character value indicating the biospecimen of the study samples. Default is "Blood". The other options include "Urine", "Feces", "Cerebrospinal Fluid", "Saliva", "Breast Milk", "Sweat", "Cellular Cytoplasm", "Amniotic Fluid", "Aqueous Humour", "Ascites Fluid", "Lymph", "Tears", "Bile", "Semen", "Pericardial Effusion"

hmdb_detection_preference

a logical value indicating whether to use the HMDB detection preference for the adduct identification. Default is TRUE. If TRUE, then only the metabolites noted as "detected" in the HMDB database will be used for MSMICA algorithm. If FALSE, then all the metabolites specified in the metabolite_database will be used for MSMICA algorithm.

All_Adduct

the adduct forms of the metabolites. Default is c("M+H","M+Na","M+2Na-H","M+H-H2O","M+H-NH3","M+ACN+H","M+ACN+2H","2M+H","M+2H","M+H-2H2O") for the positive mode. This includes primary and secondary adducts.

metabolite_database

a character value indicating the metabolite database to be used. Default is "KEGG_HMDB".

reaction_database

a character vector specifying the reaction database to be used. Default is c("mammalia"). Other option is c("general").

backpropagation_correlation_direction

the direction of the backpropagation precursor-product/transportercorrelation coefficient. Default is "positive". Other options are "both". If positive, then only the positive backpropagation correlation coefficient is used. If both, then both the positive and negative backpropagation correlation coefficients are used.

adduct_correlation_r_threshold

the correlation threshold for adduct correlation analysis. Default is 0.39 (spearman correlation).

adduct_correlation_time_threshold

the retention time threshold for adduct correlation analysis. Default is 6 (seconds).

isotopic_correlation_r_threshold

the correlation threshold for isotopic correlation analysis. Default is 0.71 (spearman correlation).

isotopic_correlation_time_threshold

the retention time threshold for isotopic correlation analysis. Default is 4 (seconds).

imputation_method

the method to be used for missing value imputation. Default is "half_min". Other options are "QRILC" (QRILC is better for triplicate samples) and NA. If NA, then no imputation is performed.

prefix

a prefix to be added to the output files. Default is "".

ion_mode

the ionization mode of the metabolomics data. Default is "positive". Other options are "negative".

detail

a logical value indicating whether to save the intermediate results as csv files. Default is FALSE. WARNING, this can create thousands of files with a lot of space. Use with caution.

save_unidentified

a logical value indicating whether the unidentified features should be saved. Default is FALSE.

progress_log

a logical value indicating whether to save the log of all printings and messages to a text file. Default is TRUE.