Skip to contents

Getting Started


library(MSMICA)

# ── Step 1: Load your feature table ──────────────────────────────────────────
# The feature table must have m/z as column 1, retention time (seconds) as column 2, and per-sample intensities in the remaining columns.
data(feature_table_exp_hilicpos)
data(feature_table_exp_c18neg)

# print the feature table
print(feature_table_exp_hilicpos)
print(feature_table_exp_c18neg)
# ── Step 2: QC filter ─────────────────────────────────────────────────────────
# Remove features that appear in fewer than 20 % of samples. The intensity column is assumed to start from the 3rd column.
feature_table_exp_hilicpos <- QC_filter(
    x = feature_table_exp_hilicpos,
    metabolite_start_column = 3,
    minimum_sample_appear = 0.20
)

feature_table_exp_c18neg <- QC_filter(
    x = feature_table_exp_c18neg,
    metabolite_start_column = 3,
    minimum_sample_appear = 0.20
)

# print the feature table after QC filter
print(feature_table_exp_hilicpos)
print(feature_table_exp_c18neg)
# set up your working directory for the MSMICA output files (change it to your own working directory)
setwd('/Users/james/Desktop/Emory University - Ph.D./PhD dissertation/MSMICA/Publication/Abstract/MSMICA_package/MSMICA/vignettes')

# ── Step 3: Run MSMICA ────────────────────────────────────────────────────────
# Select one ion mode at a time and provide the appropriate adduct list.
## mode can be "positive" or "negative"
## sample_type can be "fluid" or "tissue"
tissue_positive_adducts <- msmica_adducts(mode = "positive", sample_type = "tissue")
tissue_negative_adducts <- msmica_adducts(mode = "negative", sample_type = "tissue")

# print the adducts
print(tissue_positive_adducts)
print(tissue_negative_adducts)
# run MSMICA algorithm for tissue positive mode
tissue_positive_MSMICA_results <- MSMICA_algorithm(
    met_raw_wide    = feature_table_exp_hilicpos,
    LC              = "HILIC",    # chromatography
    LC_run_time     = 5,          # minutes
    mz_threshold    = 10,
    ion_mode        = "positive",
    All_Adduct      = tissue_positive_adducts,
    biospecimen     = "Blood", # this is the default option; another option is "Urine". The other options are not well documented so using Blood may be a good alternative.
    reaction_database = c("mammalia"),
    prefix          = "MSMICA_test_hilicpos"
)

# run MSMICA algorithm for tissue negative mode
tissue_negative_MSMICA_results <- MSMICA_algorithm(
    met_raw_wide    = feature_table_exp_c18neg,
    LC              = "C18",    # chromatography
    LC_run_time     = 5,          # minutes
    mz_threshold    = 10,
    ion_mode        = "negative",
    All_Adduct      = tissue_negative_adducts,
    biospecimen     = "Blood",
    reaction_database = c("mammalia"),
    prefix          = "MSMICA_test_c18neg"
)

# print the MSMICA results
print(tissue_positive_MSMICA_results)
print(tissue_negative_MSMICA_results)

Function Reference


Function Description
MSMICA_algorithm() Main entry point. Runs the full three-stage identification pipeline.
QC_filter() Removes low-prevalence features (appear in < x% of samples).
msmica_adducts() Returns preset adduct vectors for common ion mode and sample type combinations.
find.Overlapping.mzs() Fast ppm-based m/z matching between two feature tables using data.table.
custom_biochemical_reaction_loading() Loads the bundled curated biochemical reaction dataset.

All other functions in the package are internal helpers called automatically by MSMICA_algorithm().

Key Parameters


Parameter Default Description Alternative Options
mz_threshold 10 m/z matching tolerance in ppm. Use 10 ppm with high-resolution instruments (including Orbitrap MS). User-defined numeric threshold based on instrument performance.
LC "HILIC" LC column type for RT prediction. "RP" or "C18".
LC_run_time Total LC run time in minutes (required). Any positive numeric runtime in minutes.
biospecimen "Blood" Biospecimen type used for HMDB concentration priors. Only Blood and Urine are well characterized. Other biospecimen types are not well documented so using Blood may be a good alternative. "Urine", "Feces", "Cerebrospinal Fluid", "Saliva", "Breast Milk", "Sweat", "Cellular Cytoplasm", "Amniotic Fluid", "Aqueous Humour", "Ascites Fluid", "Lymph", "Tears", "Bile", "Semen", "Pericardial Effusion".
ion_mode "positive" Ionization mode. "negative".
All_Adduct msmica_adducts("positive", "fluid") Adduct forms considered for matching. Use msmica_adducts(mode, sample_type) for presets: mode = "positive" or "negative"; sample_type = "fluid" or "tissue". You can also provide a custom character vector.
adduct_correlation_r_threshold 0.39 Spearman correlation threshold for adduct correlation analysis. User-defined numeric threshold (typically between 0 and 1).
adduct_correlation_time_threshold 6 Retention-time threshold (seconds) for adduct correlation analysis. User-defined positive numeric value in seconds.
isotopic_correlation_r_threshold 0.71 Spearman correlation threshold for isotopic correlation analysis. User-defined numeric threshold (typically between 0 and 1).
isotopic_correlation_time_threshold 4 Retention-time threshold (seconds) for isotopic correlation analysis. User-defined positive numeric value in seconds.
reaction_database "mammalia" Biochemical reaction database(s) for precursor-product scoring. "general".
imputation_method "half_min" Missing-value imputation method. "QRILC" or NA (no imputation).
detail FALSE Save intermediate CSVs (warning: 10+ large files with hundreds of MB). TRUE.
progress_log FALSE Write all messages to a .txt log file. TRUE.

Adduct Presets


msmica_adducts() returns preset adduct vectors for common ion mode and sample type combinations. These presets are meant as convenient starting points; users can still provide a custom character vector to All_Adduct.

Preset Exact adduct vector
msmica_adducts("positive", "fluid") c("M+H", "M+Na", "M+2Na-H", "M+H-H2O", "M+H-NH3", "M+ACN+H", "M+ACN+2H", "2M+H", "M+2H", "M+H-2H2O")
msmica_adducts("negative", "fluid") c("M-H", "M+Cl", "M+FA-H", "M+Hac-H", "M-H+HCOONa", "M+Na-2H", "M-2H", "2M-H", "M+ACN-H")
msmica_adducts("positive", "tissue") c("M+H", "M+K", "M+2K-H", "M+H-H2O", "M+H-NH3", "M+ACN+H", "M+ACN+2H", "2M+H", "M+2H", "M+H-2H2O")
msmica_adducts("negative", "tissue") c("M-H", "M+Cl", "M+FA-H", "M+Hac-H", "M-H+HCOOK", "M-2H", "2M-H", "M+ACN-H", "M+K-2H")