MSMICA.rmdGetting Started
library(MSMICA)
# ── Step 1: Load your feature table ──────────────────────────────────────────
# The feature table must have m/z as column 1, retention time (seconds) as column 2, and per-sample intensities in the remaining columns.
data(feature_table_exp_hilicpos)
data(feature_table_exp_c18neg)
# print the feature table
print(feature_table_exp_hilicpos)
print(feature_table_exp_c18neg)
# ── Step 2: QC filter ─────────────────────────────────────────────────────────
# Remove features that appear in fewer than 20 % of samples. The intensity column is assumed to start from the 3rd column.
feature_table_exp_hilicpos <- QC_filter(
x = feature_table_exp_hilicpos,
metabolite_start_column = 3,
minimum_sample_appear = 0.20
)
feature_table_exp_c18neg <- QC_filter(
x = feature_table_exp_c18neg,
metabolite_start_column = 3,
minimum_sample_appear = 0.20
)
# print the feature table after QC filter
print(feature_table_exp_hilicpos)
print(feature_table_exp_c18neg)
# set up your working directory for the MSMICA output files (change it to your own working directory)
setwd('/Users/james/Desktop/Emory University - Ph.D./PhD dissertation/MSMICA/Publication/Abstract/MSMICA_package/MSMICA/vignettes')
# ── Step 3: Run MSMICA ────────────────────────────────────────────────────────
# Select one ion mode at a time and provide the appropriate adduct list.
## mode can be "positive" or "negative"
## sample_type can be "fluid" or "tissue"
tissue_positive_adducts <- msmica_adducts(mode = "positive", sample_type = "tissue")
tissue_negative_adducts <- msmica_adducts(mode = "negative", sample_type = "tissue")
# print the adducts
print(tissue_positive_adducts)
print(tissue_negative_adducts)
# run MSMICA algorithm for tissue positive mode
tissue_positive_MSMICA_results <- MSMICA_algorithm(
met_raw_wide = feature_table_exp_hilicpos,
LC = "HILIC", # chromatography
LC_run_time = 5, # minutes
mz_threshold = 10,
ion_mode = "positive",
All_Adduct = tissue_positive_adducts,
biospecimen = "Blood", # this is the default option; another option is "Urine". The other options are not well documented so using Blood may be a good alternative.
reaction_database = c("mammalia"),
prefix = "MSMICA_test_hilicpos"
)
# run MSMICA algorithm for tissue negative mode
tissue_negative_MSMICA_results <- MSMICA_algorithm(
met_raw_wide = feature_table_exp_c18neg,
LC = "C18", # chromatography
LC_run_time = 5, # minutes
mz_threshold = 10,
ion_mode = "negative",
All_Adduct = tissue_negative_adducts,
biospecimen = "Blood",
reaction_database = c("mammalia"),
prefix = "MSMICA_test_c18neg"
)
# print the MSMICA results
print(tissue_positive_MSMICA_results)
print(tissue_negative_MSMICA_results)Function Reference
| Function | Description |
|---|---|
MSMICA_algorithm() |
Main entry point. Runs the full three-stage identification pipeline. |
QC_filter() |
Removes low-prevalence features (appear in < x% of samples). |
msmica_adducts() |
Returns preset adduct vectors for common ion mode and sample type combinations. |
find.Overlapping.mzs() |
Fast ppm-based m/z matching between two feature tables using
data.table. |
custom_biochemical_reaction_loading() |
Loads the bundled curated biochemical reaction dataset. |
All other functions in the package are internal helpers called
automatically by MSMICA_algorithm().
Key Parameters
| Parameter | Default | Description | Alternative Options |
|---|---|---|---|
mz_threshold |
10 |
m/z matching tolerance in ppm. Use 10 ppm with high-resolution instruments (including Orbitrap MS). | User-defined numeric threshold based on instrument performance. |
LC |
"HILIC" |
LC column type for RT prediction. |
"RP" or "C18". |
LC_run_time |
— | Total LC run time in minutes (required). | Any positive numeric runtime in minutes. |
biospecimen |
"Blood" |
Biospecimen type used for HMDB concentration priors. Only Blood and Urine are well characterized. Other biospecimen types are not well documented so using Blood may be a good alternative. |
"Urine", "Feces",
"Cerebrospinal Fluid", "Saliva",
"Breast Milk", "Sweat",
"Cellular Cytoplasm", "Amniotic Fluid",
"Aqueous Humour", "Ascites Fluid",
"Lymph", "Tears", "Bile",
"Semen", "Pericardial Effusion". |
ion_mode |
"positive" |
Ionization mode. |
"negative". |
All_Adduct |
msmica_adducts("positive", "fluid") |
Adduct forms considered for matching. | Use msmica_adducts(mode, sample_type) for presets:
mode = "positive" or "negative";
sample_type = "fluid" or "tissue". You can
also provide a custom character vector. |
adduct_correlation_r_threshold |
0.39 |
Spearman correlation threshold for adduct correlation analysis. | User-defined numeric threshold (typically between 0 and 1). |
adduct_correlation_time_threshold |
6 |
Retention-time threshold (seconds) for adduct correlation analysis. | User-defined positive numeric value in seconds. |
isotopic_correlation_r_threshold |
0.71 |
Spearman correlation threshold for isotopic correlation analysis. | User-defined numeric threshold (typically between 0 and 1). |
isotopic_correlation_time_threshold |
4 |
Retention-time threshold (seconds) for isotopic correlation analysis. | User-defined positive numeric value in seconds. |
reaction_database |
"mammalia" |
Biochemical reaction database(s) for precursor-product scoring. |
"general". |
imputation_method |
"half_min" |
Missing-value imputation method. |
"QRILC" or NA (no imputation). |
detail |
FALSE |
Save intermediate CSVs (warning: 10+ large files with hundreds of MB). |
TRUE. |
progress_log |
FALSE |
Write all messages to a .txt log file. |
TRUE. |
Adduct Presets
msmica_adducts() returns preset adduct vectors for
common ion mode and sample type combinations. These presets are meant as
convenient starting points; users can still provide a custom character
vector to All_Adduct.
| Preset | Exact adduct vector |
|---|---|
msmica_adducts("positive", "fluid") |
c("M+H", "M+Na", "M+2Na-H", "M+H-H2O", "M+H-NH3", "M+ACN+H", "M+ACN+2H", "2M+H", "M+2H", "M+H-2H2O") |
msmica_adducts("negative", "fluid") |
c("M-H", "M+Cl", "M+FA-H", "M+Hac-H", "M-H+HCOONa", "M+Na-2H", "M-2H", "2M-H", "M+ACN-H") |
msmica_adducts("positive", "tissue") |
c("M+H", "M+K", "M+2K-H", "M+H-H2O", "M+H-NH3", "M+ACN+H", "M+ACN+2H", "2M+H", "M+2H", "M+H-2H2O") |
msmica_adducts("negative", "tissue") |
c("M-H", "M+Cl", "M+FA-H", "M+Hac-H", "M-H+HCOOK", "M-2H", "2M-H", "M+ACN-H", "M+K-2H") |