16 LC-HRMS MSe raw data preprocessing with MS-DIAL

This tutorial will guide you through the different steps to process raw data from LC MSe HRMS using MS-DIAL. You should also check the online tutorial provided by the developers: https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial

You first need to convert the .raw files from GC orbitrap HRMS to the open .mzml format using MSConvert according to @ref(MSConvertGC).

16.1 Some recommended values for G2 XS qtof:

MS1 tolerance: 0.01
MS2 tolerance: 0.01
Maximum charge number: 2
Consider Cl Br: Yes
Minimum peak height: 10000
Mass slice width: 0.05
Smoothing method: Linear weighted moving average
Smooth level: 3
Minimum peak width: 5
MSMS ab cutoff: should be >500 Da to get rid of noisy background peaks to the spectrum
Exlude after precursor: Yes
Keep isotopic ions until: 8 (but varies if Cl and Br will be used for mass defect plots? Maybe higher value (10?) for negative mode?? NEED TO CHECK!!)
Keep isotopic ions w/o MS2Dec: Yes

CorrDec settings
MS2 tolerance: 0.01 Da
Min MS2 peak intensity:100 amplitude (this should be tested against individual features to optimize)
Min number of detected samples: 3 (depends on the total number of samples, should have many samples to be efficient)
Excl highly correlated spots: 1 (to also include precursor ion in the MS2)
Min. corr coefficient (MS2): 0.85
Margin 1: unsure
Margin 2: unsure
Min detected rate: 0.5 (default but unsure what it means, percent or fraction?)
Min MS2 rel intensity: 2 % Remove peaks larger than precursor: Yes

How to export CorrDec spectra? http://www.metabolomics-forum.com/index.php?topic=1473.0

16.2 Post processing

16.3 Further explanation for some terminologies apart from the tutorial (https://mtbinfo-team.github.io/mtbinfo.github.io/MS-DIAL/tutorial)

Gap fill: The peak less than “minimum abundance” parameter of peak detection tab will be not detected even though there is a peak, and it can be gap-filled for the alignment process if the peak having the same RT and m/z is detected in other samples.

From http://www.metabolomics-forum.com/index.php?topic=1469.0:

In the alignment process, this program will make a master peak list (in a worst case) like:

RT: 1.05 min, m/z: 100.01
RT: 1.1 min, m/z: 100.015
RT: 1.2 min, m/z: 100.03.

Then, a peak having e.g. RT: 1.1 min and m/z: 100.02 in a file e.g. (Q) will be aligned to (B) of the master list. After that, this program will recognize that the file (Q) does not have the certain peaks for (A) and (C), and then the program will perform the gap filling method to fill the values. However, for example, when users use an RT tolerance as 0.1 min and an m/z tolerance as 0.01 Da for the alignment tolerance, newly created peak for a master list’s (A) in the file (Q) should be nearly equal to what the file (Q) has for the master peak list’s (B) because the extracted ion chromatogram for the gap fill process for peak (A) should be drawn by the tolerances of RT=1.05 min +/ 0.1 min, m/z=100.01+/- 0.01. Because of these, MS-DIAL is supposed to export very similar peak height/area values for the master peak list’s (A) and (B) in the result of file (Q). Here, (B) is recognized as “detected” and (A) is recognized as “gap-filled” although the origin of peak is the same between them. You can know the peak origins (detected or gap-filled) in the ‘peak id’ matrix from the alignment result export option where the “-1” value is described when the peak value is inserted by the gap-filled process.

Mass slice width: depending on the processing settings, sometimes we observe the same compound splitting into 2-3 different features, and we think the mass slice width (in Peak detection) may be one of the parameters driving this effect (in addition to the alignment tolerance across samples).

How can we properly to choose a slice width that fits our data, i.e. neither a too wide nor too narrow? For high-res Orbitrap a width of 0.05 Da is suggested. As an example, attached is an internal standard run on our LC Orbitrap (acquired at 140.000 resolution, profile).

From my understanding of Tsugawa et al 2015 (Peak spotting) https://www.ncbi.nlm.nih.gov/pubmed/25938372 and the math presentation describing the MS-DIAL peak detection algorithm (default 0.1 Da), it seems MS-DIAL would correct the merging of BPCs across different m/z widths, somewhat similarly to XCMS? Below, from Smith et al 2006 (Peak detection) https://pubs.acs.org/doi/10.1021/ac051437y

“An important detail is the relationship between spectral peak width and slice width.

if the peak width is larger than slice width: the signal from a single peak may bleed across multiple slices. Low-res MS produce peak widths greater than the XMCS default 0.1 m/z slice. The MEND peak detection algorithm uses a scoring function to assess whether a chromatographic peak is also at the maximum of a spectral peak, preemptively removing such bleed. Instead of eliminating spurious extra peaks during detection, our algorithm uses a post-processing step that descends through the peak list by intensity, eliminating any peaks in the vicinity (0.7 m/z) of higher intensity peaks.
if the peak width is smaller than the slice width: high-res TOF or Fourier transform MS often exhibit such behavior. In that case, depending on the scan-to-scan precision of the instrument, the signal from an analyte may oscillate between adjacent slices over chromatographic time, making an otherwise smooth peak shape appear jagged. Based on operator knowledge of the mass spectrometer characteristics, we optionally combine the maximum signal intensity from adjacent slices into overlapping EIBPCs (i.e., 100.0/100.1, 100.1/100.2, etc.), That initial step produces both smooth and jagged chromatographic profiles, which are then used for filtration and peak detection. During the vicinity elimination postprocessing step, peaks detected from smooth profiles (integrated from the full signal) are selected over peaks detected from jagged profiles (integrated from an incomplete signal).”

when many duplicate peaks occurred, I recommend to do:
1. Use larger “MS1 tolerance” of data collection tab.
2. Use lager “Mass slice width” of peak detection tab.
3. Use larger “Smoothing level” of peak detection tab.
4. Use larger “Retention time tolerance” of alignment tab.
5. Use larger “MS1 tolerance” of alignment tab.

More info: http://www.metabolomics-forum.com/index.php?topic=1459.msg4350#msg4350

MSMS ab cutoff:

QuantMass:

First of all, the chromatogram deconvolution is performed by the information of “pure” extracted ion chromatogram of several m/z values.
The quant mass is determined from such m/z values producing the singlet/no-coeluted chromatographic peak.

At least in TMS-based GC-MS based metabolomics, m/z 147 (2xTMS moiety) becomes base peak in EI-MS spectrum for many metabolite features, and therefore, such the m/z value should not be used as the quant mass even though the m/z value is the base peak’s m/z value. Of course, the quant mass values can be changed by the quant mass manager of MS-DIAL.
For the data comparison in the large scaled data set, we should keep the same quant mass value for metabolite quantification. MS-DIAL offers such an environment.

if you put “QUANTMASS:” field in your msp format file, the metabolite is always quantified by the target m/z.
Go to “Quantmass browser” of “Post processing” in the msdial menu, then, you can define the m/z values for each metabolite. The first option should be used if many biological samples are sequentially analyzed for a long time.
More info: http://www.metabolomics-forum.com/index.php?topic=1412.0

Sigma value: