演化樹可以幫助我們了解所有生物之間的演化關係,到目前為止,已有許多研究建立方法來推算演化關係,然而,這些方法大多都必須經過多重序列比對(multiple sequence alignment, MSA)的步驟。有許多研究指出,隨著序列被加入多重序列比對的順序不同,將會顯著地影響到比對的結果。因此,我們希望尋找是否有另一種方法,使比對結果更可靠。我們的目標是建立唯一且合理的演化樹建構方法。
在此份研究中,我們提出了一個新的方法來取代傳統方法的推算過程,即是結合序列之間兩兩比對(Basic Local Alignment Search Tool, BLAST)與皮爾森相關係數(Pearson's Correlation Coefficient, PCC)來模擬序列之間的演化關係,並透過階層式分群(Hierarchical Clustering, HC)的方式來進行分群。實驗結果發現,我們的方法確實可以改善傳統方法中,順序改變影響結果的問題,並較傳統方法的分群能力更好且更合理。我們將利用這個方法來進行蛋白質精胺酸甲基轉移? (protein arginine methyltransferase, PRMT)家族的演化分析。此外,我們也想利用PRMT家族做為範本,來尋找是否有方法,能夠分辨這些家族的特徵,以便快速分類或命名未知序列。
Evolutionary relationship of all living organisms can be viewed by the phylogenetic tree. So far there are many methods have been developed to evaluate evolutionary relationships. However, multiple sequence alignment (MSA) should be performed before those methods. Several studies have shown that the order in which sequences were added to a MSA could significantly affect the end result. Therefore we want to find if there is another method that makes more reliable results. Our goal is to construct a unique and reasonable phylogenetic tree building method better than the others.
Here we propose a novel approach to replace the MSA process. We combine pair-wise sequence alignment (BLAST) and Pearson's correlation coefficient (PCC) to simulate the interactive relationship of compared sequences. The relationship would be clustered by hierarchical clustering (HC) method. The results have shown that our method indeed improved the problem that MSA may occur. Our method also has a better clustering ability than the conventional methods and could produce a more reasonable tree. We subsequently use our method to perform a phylogenetic analysis of protein arginine methyltransferase (PRMT) families. In addition, we are curious to find if there is a way to identify the pattern of each PRMT family, which makes a fast classification of an unknown sequence.