SPSCAN project database

alphabetic index / program documentation / internals / methods

Contents project database


Spscan does not store the internal status so that the same peaklists and spectra are shown when you re-start the program (as xeasy does). Instead, it uses a "project database" to collect information about and other information. Only a single project database can be loaded at a time.

Peaklists and spectra are known to the project under a short name or 'id'. When you load a peaklist (or spectrum) you get a selection of the peaklists from the project under their short names. With "new peaklist" you get the normal tool to load a peaklists with their file name. Even if you do not want to organize your data in this way, a few commands need a 'project' for formal reasons.

You can load a project database when you select 'project' 'edit'. If you press abort instead of loading an existing database, a new project is created. You have to give the new project a name and load a sequence (a file with at least a single entry "GLY 1").

If you name a project in the resource file, this project is always loaded when you start the program.

Sequential connection between residues

Most information and methods concerning the sequential connectivity between residues, and the mapping of sequentially connected residues to the "real" sequence are found in pj->seq_hd of class sequential_handler.

Define probability of a sequential connection in a library

The probability that fragment "prev" and "next" are sequentially connected is determined in two steps: sequential_handler::build_matrix() fills pj->seq_hd->pf_mat with individual values, and norm_matrix() brings the sums of rows and columns to the expected value. The initial values for a sequential connection are calculated from scores that are provided by float seqscore->match(prev, next). seq_score * seqscore is normally defined by a library of the following format: $$ not yet documented, for an example see conn.lib.

The possibility to use partially assigned peaklists for calculation of a sequential match uses a library of atom names and weights. The following names have a special meaning:

Define probability of a sequential connection on the source code level

If these methods are not appropriate, any way to get a score is possible if a new class xxx : public seq_score is defined, which provides a scoring function "float match(int prev, int next)".

prev and next are the fragment numbers of the two residues. A zero score means that the available information does not increase or reduce the probability that these two fragments are sequential. A positive score x means that a sequential connection is exp(x) times higher. A negative score means that the probability of a sequential connection is lower than 1/n. It is thus appropriate to add the score of independent conditions that speak in favour of a connection. The relevant functions are in "sequence.cc".

Check sequential connections interactively

To get confidence in sequential connections that were found with highest probability, and to select between possible sequential neighbours that reached a probabilty in the same order of magnitude, it is neccessary to compare the spectra of these residues. This is done with the strip_comparison tool. The tool is called from the interface with "project"/"organize db" - "show connection". The information which spectra should be shown for residue "n-1" and for residue "n" are read from a library, e.g. conn.lib. If a library is defined by the resource Spscan.sequential_connection_show: or Spscan.sequential_connection:, this library is used.

You can display the best-matching three residues in each direction, confirm or reject sequential connections, change probabilities, or try to map the connected residues to the sequence.

Peaklist manipulations

This table handler is invoked with "project"/"peaklist manipulations". The peaklists considered need not be in the project or be part of the project, but they must both have identical assignment schemes. Only assignment numbers are checked, they are not resolved into fragment/atom. Peaks in the two lists are corresponding, if they have identical assignment numbers and their ppm position differs by less than a given distance.
display status
displays current parameters, mode, and names of peaklists loaded
load peaklist 1 (high priority)
load peaklist 2
load the two lists. If corresponding peaks are found in both lists 1 and 2, ppm values and linewidths are taken from list 1.
exchange 1 <--> 2
as named
merge peaklists 1 + 2
Write all peaks that are at least in one of the peaklists. If corresponding peaks are in both lists, take only the peak from list 1.
difference 2 - 1
Write all peaks from list 2 that have no corresponding peak in list 1.
2D mode
set mode=CMP_2D: comparing dimensions 1/2, adding only the first peak
strip mode
set mode=CMP_strip: comparing only dimensions 1/2, adding all peaks, i.e. if a strip from peaklist 2 does not exist in peaklist 1, all peaks from this strip are used. If the strip exists in peaklist 1, no peak of this strip is used from peaklist 2.
3D mode
set mode=CMP_3D: comparing all 3 dimensions
acceptable ppm differences
enter maximum distance for two peaks to correspond to each other. Default is Lw_e[] of peaklist 1.
get assignments
Use the assignment in one list to assign peaks in the other list. Interface to pal_list::take_assignments.
remove duplicate peaks (1)
remove peaks within list 1, that correspond to each other. If this is done in "2D mode" or "strip mode" for a 3D peaklist, the result is a list with one peak for each strip.
check duplicate assignments (1)
find peaks that have the same assignment, but the position of which differs by more than the given value.

Peaklist simulation

SPSCAN can simulate peaklists according to entries in a library that define which atoms give a crosspeak. A library spec_sim.lib is provided with the program - use entries in this library as a target to define your own lists.

The first entry is the fragment type; "ALL" matches all fragment types. The second entry is the name of the atom in w1. The next entries are atom names and relative position of the atoms in w2, w3 .. If no relative position is given it defaults to 0, i.e. the atom is in the same fragment. Only the last relative position can be omitted, a relative position between two atom names must not be omitted. For the atom in w1 no relative position is given, it is 0 by definition.

"h" as the first character and "*" as the last character of a atom name in the library have a special meaning: "h" replaces "Q" or "H" and "*" replaces anything including an empty string. Comments in the library start with "!". Empty lines are ignored. If you define a dimension larger than PAL_DIM, the program will probably crash. If you do not define the dimension the default is 3.

The method project::simulate_peaklist() handles the combination of library and atom list. The routine is called by as"project" - "simulate peaklist". The routine always uses the proton and sequence lists of the project file. If you want to simulate a peaklist with a particular proton and sequence list, you have to go to another directory and create a new project file.

Adapt calibration of spectra

In the "3D tool" spscan provides routines to adapt the calibration of a spectrum in such a way that the positions of all peaks give the best match with the positions stored in a proton list (which can be adapted to another peaklist before). This does not mean that the new calibration is correct, it only means that it shows least square deviations of peak positions with respect to some other spectrum. So it is important that you start with a spectrum which is calibrated correctly.

There are four commands: (if you scroll below "exit")

check position 1/2
shows deviations of the ppm position of peaks in the two "peak dimensions" with respect to the local proton list. p0[0] and p0[1] are corrected.
check sweep width 1
check sweep width 2
check sweep width 3
shows deviations of the ppm position as a function of ppm position in one of the three dimension. p0[n-1] and psw[n-1] are corrected.
You can change the proposed corrections by moving the crosshair with the left mouse button (check position 1/2) or by moving the magenta line with the middle mouse button (check sweep width n).
"print info" displays the changes that will take place if you press "change calibration and adapt peaklists".

If you press "change ... " the following corrections are made:

This routine is used in the following way: You adapt at least some assigned peaks to the spectrum you want to check - either automatically or interactively. Then you compare the positions of the peaks with the respective peak positions of a suitable reference list. You recalibrate the spectrum, and at the same time you shift the peaks in such a way that their position with respect to the pixel position in the spectrum is unchanged.

There are two situations where you should discard the shifted peaklist and start again with the recalibrated spectrum and the original (not adapted, not shifted) peaklist:

Shifting the peaklist with the spectrum is ok for those peaks that have previousely been adapted to the old calibration of the spectrum. It is not ok for those peaks that could not be adapted and were still around the position they had in another spectrum. Loading a peaklist that matches the reference spectrum, however, will bring all peaks to a position near to their real position in the recalibrated spectrum.

You can "query" (select with middle mouse button, "q") volume and quality of the peaks that gave rise to the correction in order to distinguish relevant and irrelevant peaks.

(It has been claimed that correction of the sweep with is obsolte, because the sweep width can be calculated exactly. Theoretically I agree. Practically I can show you a number of spectra where the sweep width is wrong, usually by (n-1)/n, and the error is not always recognized.)

alphabetic index / program documentation / internals / methods
Ralf W. Glaser
Institut für Molekularbiologie & Biophysik
ETH Hönggerberg
CH-8093 Zürich
E-mail: ralf@mol.biol.ethz.ch