Mass spectrometry is the leading tool for rapid, in-depth analysis of the protein composition of biological samples. Spectral counting is an effective, label-free method for measuring relative abundances of proteins within these samples, but there is a lack of effective tools for making use of this valuable data. Mass spectrometry requires replicate analyses of a sample to effectively probe the proteome in depth. A necessary tool therefore would allow the experimenter to use preliminary data to predict the rate of future protein detections and determine the minimum number of runs needed. Here, I introduce such a method---based on the fundamentally different behavior of true and false positive protein detections in repeat runs of the same sample or in a series of clinical samples--- thus allowing for more efficient experimental design. After proteins have been detected in such an experiment, there is often a need to compare protein abundance levels between two samples, as in biomarker discovery projects. The current tools for detecting differential expression in spectral count space are not satisfactory because they do not take into account the skewed distribution of this data. Therefore, I developed ReSASC (Resampling-based Statistical Analysis for Spectral Counts) to detect differential expression between two samples based on spectral count output alone. The algorithm works on the principle that similarly expressed proteins are expected to have similar distributions of spectral counts and can effectively be grouped. The output from this algorithm represents only one piece of many outputs from a typical mass spectrometry experiment, and these data are not currently used for global analysis of results. Data is generally analyzed at one level rather than globally, although each step of a mass spectrometry experiment---and the resulting data from that step---is critical to later stages and therefore the final output. Here I adapted a visualization software, FlowJo, currently used for flow cytometry analysis, and show that it offers useful insight when applied to proteomics datasets, and allows for unexpected discoveries. Together, these methods comprise a valuable set of tools for more effective mass spectrometry analysis with useful applications to future biomarker discovery work. |