An algorithm of discovering signatures from DNA databases on a computer cluster

doi:10.1186/1471-2105-15-339

中山醫學大學機構典藏 CSMUIR > 健康管理學院 > 應用資訊科學學系暨碩士班 > 期刊論文 > Item 310902500/10390

Please use this identifier to cite or link to this item: https://ir.csmu.edu.tw:8080/ir/handle/310902500/10390

Title:	An algorithm of discovering signatures from DNA databases on a computer cluster
Authors:	Lee, Hsiao Ping Sheu, Tzu-Fang
Contributors:	中山醫學大學
Keywords:	Signature discovery;Computer clusters;Divide-and-conquer strategies
Date:	2014-10
Issue Date:	2015-03-04T04:50:08Z (UTC)
ISSN:	1471-2105
Abstract:	Background Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. Results In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. Conclusions The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at
URI:	https://ir.csmu.edu.tw:8080/ir/handle/310902500/10390 http://dx.doi.org/10.1186/1471-2105-15-339
Relation:	BMC Bioinformatics 2014, 15:339
Appears in Collections:	[應用資訊科學學系暨碩士班] 期刊論文

Files in This Item:

File	Description	Size	Format
7.pdf	期刊	536Kb	Adobe PDF	550	View/Open
index.html		0Kb	HTML	595	View/Open

View Licence

Loading...