※ Documentation:

This website is free and open to all users and there is no login requirement.


Results intepretation:

Here we use the human protein ZNF451 as the example. After clicking "Submit", the prediction results of SUMOylation sites and SIMs with high threshold are shown as follows:

<1>. The table of the GPS-SUMO 2.0 results

ID:The name/id of the protein sequence that you input to predict.
Position: The position of the site which is predicted to be sumoylated or to interact with sumo protein.
Peptide: The predicted peptide with 7 amino acids upstream and 7 amino acids downstream around the modified residue.
Score: The value calculated by GPS-SUMO algorithm to evaluate the potential of SUMOylation or sumo interaction. The higher the value, the more potential the residue is sumoylated or interact with sumo protein.
Cut-off: The cutoff value under the threshold. Different threshold means different precision, sensitivity and specificity.
Type: Whether the sequence is SUMOylation site or sumo interacting motif.
Source: Whether the result is validated by experiment, "Exp." means YES, while "Pred." means NO. "Exp." links to the CPLM 4.0 source site.
PPI: The protein interacting with the substrate in predicted sumo events. For SUMO sites, we provide Protein-Protein Interaction (PPI) information for the target protein and E3 enzymes. If it's a SIM, we offer the PPI information for the target protein and SUMO proteins.
Logo: The sequence logo of this peptide.

<2>. The visualization of default prediction

Part 1:
Left: The 3D structure of the substrate labeled with predicted sites.
Right: The distribution of negative sites and positive sites in SUMOylation sites.
       The distribution of negative sites and positive sites in SUMO-interacting motif.
You can click on the “Export” button to download statistical graphics.

Part 2:
Up: The visualization for the positional distribution of the predicted site in protein sequence. By default, the sites with the highest 3 predicted scores are displayed.
Down: The visualization for protein disordered region predicted by IUPred [PMID: 15955779]. Cutoff = 0.5, if score of prediction > cutoff, the residue is considered in disordered region.
You can minimize the chart by clicking the 'Min' button, maximize it by clicking 'Max,' or use the slider button to adjust the scale of the peptide sequence length to the desired size for convenient observation.

<3>. The visualization of comprehensive prediction

Part 3:
Top: The surface accessibility of amino acids and the protein disordered region were predicted by NetSurfP ver. 1.1 (PMID: 19646261) and IUPred (PMID: 15955779), respectively. The cutoff of disordered region prediction = 0.5, if score of prediction > cutoff, the residue is considered in disordered region. The cutoff of surface accessibility prediction = 0.25, if score of prediction > cutoff, the residue is considered as surface exposed residue.
Bottom: The positions of the predicted SUMOylation sites or SIMs were visualized in the protein sequence together with the secondary structure predicted by NetSurfP ver. 1.1 (PMID: 19646261).

Frequently Asked Questions:

1. Q: How to use GPS-SUMO 2.0 web server?

A: Please visit GPS-SUMO 2.0 at HOME page(https://sumo.biocuckoo.cn/index.php). We provide the default service.

And for advanced prediction, We provide 6 versions of prediction for users at ADVANCED page(https://sumo.biocuckoo.cn/advanced.php). You can click the check box at ADVANCED page to change the online service mode or just click the following names of predictor:

(1) GPS-SUMO 2.0 (PLR): Prediction based on penalized logistic regression with the group-based prediction system feature. We provide 3D structure, statistics and disorder propensity of protein.(Also available at the HOME page). (Speed: )
(2) GPS-SUMO 2.0 (Transformer): Prediction based on Transformer with the contextual information, which balances the accuracy with speed. (Speed: )
(3) GPS-SUMO 2.0 (Comprehensive): Prediction based on all models with all features. (Speed: )
(4) GPS-SUMO 2.0 (Species-specific):Species-specific prediction based on all models with all features. We provide 13 species for species-specific prediction. If you want to focus on certain species, you may choose this one. (Speed: )
(5) GPS-SUMO 2.0 (Comprehensive): Prediction based on all models with all features and additional annotations of secondary structure and surface accessibility. (Speed: )
(6) GPS-SUMO 2.0 (Stress conditions): Prediction based on penalized logistic regression, using 39,938 non-redundant SUMOylation sites identified under various stress conditions, such as SUMO protease inhibition, proteasome inhibition and heat shock. (Speed: )

In each of the above 6 prediction versions, you can choose sequence(s) in FASTA format or UniProt accession number(s) for prediction by clicking the corresponding check box.

For Windows and Unix/Linux users, please use the keyboard shortcuts "Ctrl+C & Ctrl+V" to copy and paste your FASTA format sequences into TEXT form for prediction. And for Mac users, please use the keyboard shortcuts "Command+C & Command+V". You could input one primary sequence or multiple proteins' sequences in FASTA format.

Then please click on the "Submit" button to run the program. The prediction results will be shown in the prediction form. Again, please click on the ‘Download’ button on the top of the prediction form to save the results in TXT, Excel or ZIP format. If you want to download statistical graphics, please click on the “Export” button.

2. Q: Is GPS-SUMO 2.0 much better than GPS-SUMO?

A: Yes! Firstly, the fourth-generation GPS (Group-based Prediction System) algorithm was remained in GPS-SUMO 2.0 and more sequence- and structure- based features and deep learning algorithm have been added. The prediction performance was greatly improved against our previous tools. Secondly, we updated options allowing users to predict with protein sequence or identifiers. We also provide the PPI information, sequence Logo, 3D structures, links to PTM database and prediction of secondary structure and disorder propensity. The visualization and user-friendless were greatly improved. Thirdly, the training data set of GPS-SUMO 2.0 was updated by searching the scientific literature published before June 2020 and 13 PTM databases, which is the largest amount of training data so far. Thus, the prediction accuracy of GPS-SUMO 2.0 was significantly improved.

Data statistics and function for GPS-SUMO 2.0, GPS-SUMO, SUMOsp 2.0 and SUMOsp
Non-redundant Data Sets
SUMOylation Sites59,069912332239
SUMO Substrates10,762510197144
SUMO-interacting Proteins1028000
Data Size~7.2 GB~30 MB~17 MB~10 MB
Algorithms and Functions
SIM Predictor××
Species-specific Prediction×××
PPI Pairs27,482×××
3D Structures6,428×××

3. Q: There are three thresholds used in your predictor, what do these parameters mean?

A: There are two types of predictors including SUMOylation sites or SUMO-interacting motifs, the threshold options only affect the corresponding one. After GPS-SUMO 2.0 predictor model was well-trained, we performed an evaluation on this model. From the evaluation, three thresholds with high, medium and low stringencies were chosen for GPS-SUMO. The performance under these three thresholds was presented as follow:

The performance of GPS-SUMO in different threshold
  SUMOylation SUMO interaction
Ac Sn Sp MCC Pr Ac Sn Sp MCC Pr
High 88.63% 57.45% 95.17% 0.5749 71.39% 94.64% 90.06% 95.17% 0.7551 68.08%
Medium 86.60% 68.24% 90.46% 0.5585 60.00% 90.88% 98.14% 90.05% 0.6823 53.02%
Low 84.33% 75.98% 85.01% 0.5293 51.54% 86.80% 99.38% 85.36% 0.6081 43.72%

4. Q: I have a few questions which are not listed above, how can I contact the authors of GPS-SUMO 2.0?

A: Please contact the responding author: Dr. Yu Xue for details.

5. Q: Can I use GPS-SUMO 2.0 on different browser?

A: Yes, we test our web server on different browsers.

Browser compatibility
OSVersionChromeFirefoxMicrosoft EdgeSafari
LinuxUbuntu 18.04107.0.5304.107107.0.1N/AN/A
Windows10107.0.5304.107107.0.1108.0.1462.46 N/A