Information Technology | Databases » ySelect Users Guide, Sequest Compatible Version

Datasheet

Year, pagecount:2008, 6 page(s)

Language:English

Downloads:2

Uploaded:March 25, 2021

Size:754 KB

Institution:
-

Comments:

Attachment:-

Download in PDF:Please log in!



Comments

No comments yet. You can be the first!


Content extract

ySelect Users’ Guide (Sequest-compatible version) Background. During the course of a typical run, Sequest 1 examines fragmentation spectra obtained from a LC-MS/MS experiment and tries to match them with peptide sequences drawn from a protein database. It reads the peaks list file (in dta format) for each fragmentation spectrum in turn and writes out a companion search results file (.out format) listing the best matches with supporting information. The utility program DTASelect 2 may then be invoked with the “-n” option, combing through all of the generated .out files and, among other things, assembling a list that includes one line of information for the best match to each fragmentation spectrum. A sample fragment of a DTASelect.txt file is included in the Appendix As Sequest and DTASelect have been described elsewhere, please refer to the references for further information. The purpose of the ySelect program is to comb through the DTASelect-generated “DTASelect.txt” file

in turn, filtering out matches to spectra that fail to meet a confidence cutoff level and producing a list of those that passed for consumption by yRatios. ySelect was implemented separately from its companion program, yRatios, to simply the process of integrating yRatios’ functionality with the results from other search programs such as Mascot. Further information concerning yRatios is provided in a separate users’ guide. Algorithmic Details. ySelect uses the PRISM strategy 3 to assess confidence levels. Briefly, by mapping a large dataset including decoy database hits onto a multidimensional space with axes representing such parameters as cross-correlation and examining the relative density of protein hits and decoy hits, an empirical function predicting confidence levels was derived. All necessary parameters for the function (XCorr, deltCN, spRank and charge) are present in the “DTASelect.txt” file How to compile YSelect. The ySelect program is written in C To compile

it on a UNIX platform, set the current working directory to be that directory within which the ySelect source has been copied and enter “cc ySelect.c –lm –o ySelect” at the shell prompt. It is also readily possible to compile ySelect on any Windows platform that has a C compiler installed. Freeware programs such as Quincy 2005 are available on the web and are more than adequate for this purpose. The ySelect executable may be run from a Windows command prompt. Prior to compilation, one computational detail needs attending to, namely the PATH SEP CHAR defined constant. If compiling on a UNIX system, this should be set to ‘/’ (i.e a forward slash) On a Windows system, it should instead be set to ‘\’ (ie a backslash; an extra backslash is necessary as an escape character as part of the C language convention). Inputs. Invoking ySelect without any arguments, or with arguments incorrectly specified, will cause the following text to be displayed: Usage: ySelect [option(s)]

<DTASelect file> . Options are: -q <confidence> -L <locus> -1 -d set the confidence cutoff results for one locus only ignore singly-charged matches output directory+filename The “-q” option is used to set a confidence level cutoff. The accepted range is a real number from 0 to 99.5 inclusive If 95 were specified, for example, a 95% confidence level cutoff would be applied. Left unspecified, the default is 99% Users may wish to exclude matches to spectra based on presumed singly charged species of peptides. This can be accomplished by specifying the “-1” option; only matches based on the assumption of a multiple-charge (+2 or +3) will be listed. It is necessary to provide the file name of at least one DTASelect.txt file as an argument to ySelect. Although an arbitrary number of these files may be specified, the ability to handle more than one was implemented for purpose of merging the qualityfiltered results of precisely two searches performed on the same

set of .dta files: one normal and one presuming that the tryptic peptides have been labeled with two 18O atoms at their C-terminal carboxyl groups (this labeling is explained more fully in the yRatios Users’ Guide). Output. All yRatios output is directed to standard output (ie the shell or command window). To store it in a file, redirect the output (ie add “> <file name>” at the end of the command; this will work both for both a UNIX shell prompt and a Windows command prompt. The output itself is a “scan list”, consisting of two types of lines. The locus line begins with “---locus---“, followed by a space and then a protein identifier. Until the next locus line is encountered, all spectra and peptides listed are considered to belong to that locus. Accordingly, the second type of line names a spectrum file and the amino acid sequence of the associated peptide. The sequence of the peptide may appear on more than one line because a LC-MS/MS experiment may capture

more than one spectrum that matches the peptide. It’s presumed that all the spectra associated with a given peptide will appear in successive lines, rather than being scattered about in the file. A portion of a scan list file is provided as a sample below: --locus-- B0035.7 CW 042507 worm both step06.15642156422dta CW 042507 worm both step06.15630156302dta CW 042507 worm both step06.15300153002dta CW 042507 worm both step06.15289152892dta CW 042507 worm both step04.14782147822dta CW 042507 worm both step05.12404124042dta CW 042507 worm both step04.15550155502dta CW 042507 worm both step04.14749147492dta --locus-- B0035.9 CW 042507 worm both step05.865386532dta CW 042507 worm both step05.863986392dta --locus-- B0041.4 CW 042507 worm both step03.605060501dta CW 042507 worm both step03.603960391dta CW 042507 worm both step05.762776272dta CW 042507 worm both step05.762376232dta CW 042507 worm both step04.683168312dta CW 042507 worm both step04.681168112dta CW 042507 worm both

step04.680768072dta CW 042507 worm both step04.679667962dta CW 042507 worm both step05.769976991dta CW 042507 worm both step05.769876981dta VGAGAPVYLAAVLEYLAAEVLELAGNAAR VGAGAPVYLAAVLEYLAAEVLELAGNAAR VGAGAPVYLAAVLEYLAAEVLELAGNAAR VGAGAPVYLAAVLEYLAAEVLELAGNAAR LLAGVTIAQGGVLPNIQAVLLPK LLAGVTIAQGGVLPNIQAVLLPK LLAGVTIAQGGVLPNIQAVLLPK LLAGVTIAQGGVLPNIQAVLLPK TVTAMDVVYALK TVTAMDVVYALK NIPGVDVMNVER NIPGVDVMNVER GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK GHVIDQVAEVPLVVSDK ySelect lists the loci in alphabetical order. To quickly examine a single protein of interest, use the “-L” option. The scan list subsequently produced will be for that locus alone, rather than the full list for all loci having matches to spectra that meet the confidence level cutoff. If the given locus doesn’t have any such spectra or is not represented in the DTASelect.txt file at all, the scan list will be empty. In the sample above,

only the names of the .dta files appear If the “-d” option is specified, the directory that each file was in will be prefixed to the file’s name. This is useful in those cases where multi-step LC-MS/MS (MudPIT) was performed, with the spectrum files being deposited in several different subdirectories. Instead of having to gather them all into one directory prior to running yRatios, it can be invoked with the current working directory unchanged following the DTASelect and ySelect runs. References. 1. Eng, JK; McCormack, AL; Yates, JR III An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. 1994 J Am Soc Mass Spectrom 1994 5, 976-89 2. Tabb, DL; McDonald, WH; Yates, JR III DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. 2002. J Proteome Res 1(1):21-6 3. Kislinger, T; Rahman, K; Radulovic, D; Cox, B; Rossant, J; Emili, A PRISM, a generic large-scale proteomic

strategy for mammals. 2003 Mol Cell. Proteomics 2(2):96-106 Appendix Sample fragment of a DTASelect.txt file: (note than extensive line-wrapping has occurred): DTASelect v1.8 /data/search/Carl 2007/O16vsO18/both/2007-04-25 S /data/dbase/C.elegans/wormpepandreverseAug2006fasta 00 0.0 0.0 0.0 true true SM 57.0 C DM 0.0 * STY 0.0 0.0 DM 0.0 # M 0.0 0.0 DM 0.0 @ KR 0.0 0.0 Type Locus Length MolWt pI Gene Name Type Filename Subdirectory XCorr DeltCN PrecursorMass TotalIntensity SpRank IonProportion Sequence SequencePosition Tryptic UniqueToLocus L 2L52.1 427 50017.95 8.387695 CE32090 WBGene00007063 Zinc finger, C2H2 type status:Partially confirmed TR:Q9XWB3 protein id:CAA21776.2 U D CW 042507 worm both step04.261926191 CW 042507 worm both step04 08276 0.0152 4258 29675 4 1.0 K.EFKS 132 2 false U D CW 042507 worm both step04.362136211 CW 042507 worm both step04 07774 0.1261 42449 39901 2 1.0 K.EFKS 132 2 false U D CW 042507 worm both step04.586958692 CW 042507 worm both step04 17488

0.0755 150494 8773.7 5 0.45833334 K.MPKIEVEDSLVNKF 387 2 true U D CW 042507 worm both step04.335233521 CW 042507 worm both step04 09388 0.0334 51616 30636 9 0.6666667 K.RPSRA 404 2 false U D CW 042507 worm both step04.340134011 CW 042507 worm both step04 10408 0.0774 51329 28324 5 0.6666667 K.RPSRA 404 2 false U D CW 042507 worm both step04.418641861 CW 042507 worm both step04 09512 0.0163 5126 35508 29 0.5 K.RPSRA 404 2 false U D CW 042507 worm both step06.917191713 CW 042507 worm both step06 20919 0.1901 390119 10440.4 29 0.12096774 R.CNYDSDESELESDEFWSATEMSDNEEVYVNFRG 187 2 true U D CW 042507 worm both step02.14490144902 CW 042507 worm both step02 1.4666 0022 220257 7637.2 231 0.20588236 R.EECIQPVSVEKNILHFEKF 283 2 true U D CW 042507 worm both step03.323332331 CW 042507 worm both step03 10342 0.0136 50727 33240 77 0.5 R.ENNKF 311 2 false U L 2RSSE.1 343 37960.273 9.23877 CE32785 WBGene00007064 status:Partially confirmed TR:Q8I133 protein id:CAD59137.1 U D CW 042507 worm both

step04.11120111203 CW 042507 worm both step04 1.9126 00075 356343 6773.9 72 0.10294118 K.CAGAYSLAAIHLAEEASPEPTPTTSKPPRGNGVGRA 268 2 true U