Querying Large Biological Network Datasets.

Gulsoy, Gunhan

New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets. We begin with considering querying of large biological network datasets as a collection of individual comparisons. First, we define measures of similarities between two networks. In order to do this, we develop an algorithm to align two biological networks. Then, we validate this algorithm using biological evidence. This algorithm incorporates both sequence similarity of nodes and topology of the network itself. Then, we formulate an algorithm which uncovers a hierarchical structure in transcriptional regulatory networks. This algorithm works purely on the topology on the network. Using this method, we show relations between the functions of genes and topological properties of networks. Finally, we analyze functional properties of metabolic networks. We use first calculate the elementary flux modes of the metabolic networks. Then, for each genetic function, we analyze relations between functions and metabolic flux cones. In this method, we aim to analyze networks functionally. Next, we consider the large datasets. In biological networks, pairwise operations are costly. Therefore, exhaustive comparison of all networks with a query is infeasible. In order to tackle this problem, we develop reference based indexing. In this method, we first build a small set of reference networks. Then, instead of aligning a query with all the database networks, we use references to calculate upper and lower bounds for the alignment scores between the query and all the database networks. Using these bounds, we calculate 80% of the database networks quickly. We experimentally show that, we can successfully mine statistically and biologically significant relationships in a large database of biological networks. Finally, we propose a new method, which uses a dynamic set of references in reference based indexing. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]