Most modern tools used to predict sites of small ubiquitin-like modifer (SUMO) binding (referred to as
SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these
tools rarely consider the infuence of post-translational modifcation (PTM) information for other sites
within the same protein on the accuracy of prediction results. This study applied the Random Forest
machine learning method, as well as motif screening models and a feature selection combination
mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to
prediction method, PTM sites were coded as new functional features in addition to structural features,
such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary
structure information that is important for PTM. Twenty cycles of prediction were conducted with a
1:1 combination of positive test data and random negative data. Matthew’s correlation coefcient of
SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further
verifed the important role of PTM in SUMOgo and includes a case study on CREB binding protein
(CREBBP).