NotesFAQContact Us
Search Tips
ERIC Number: ED559482
Record Type: Non-Journal
Publication Date: 2013
Pages: 123
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3032-9294-1
Image/Time Series Mining Algorithms: Applications to Developmental Biology, Document Processing and Data Streams
Tataw, Oben Moses
ProQuest LLC, Ph.D. Dissertation, University of California, Riverside
Interdisciplinary research in computer science requires the development of computational techniques for practical application in different domains. This usually requires careful integration of different areas of technical expertise. This dissertation presents image and time series analysis algorithms, with practical interdisciplinary applications to develop-mental biology, historical manuscript processing, and data stream processing. Inspired by the NSF IGERT program, this dissertation presents algorithms for analysis of growth dynamics at the shoot apex of Arabidopsis "thaliana". A robust understanding of the causal relationship between gene expression, cell behaviors, and organ growth requires the development of computational techniques for quantitative analysis of real-time, live-cell meristem growth data. This requires the development/application of image analysis tools and novel time series alignment algorithms. Image analysis is necessary for the computation of growth features, but this leads to a time series of unsynchronized growth data, which requires a robust alignment method. Towards this end, we present two time series alignment algorithms. This dissertation further considers image mining in historical document processing. An application of the Minimum Description Length principle (MDL) to develop a symbols clustering algorithm is presented. The developed algorithm produced one of the first practical applications of MDL to real-world, real-valued data such as images. Moreover, we introduce a novel premise that a clustering algorithm should have the freedom to "ignore" some data. Extensive empirical results show that the MDL-based algorithm outperforms the popular K-Means clustering algorithm, given the same input data, distance measure, and the correct value of K in K-means. The new algorithm could have significant impact, as clustering is a critical subroutine in almost all historical document processing systems. Finally, we present an algorithm for detecting rare and approximately repeating sequences in unbounded real-valued data streams, given limited space. This algorithm employs the novel integration of SAX time series representation with a Bloom filter to develop a robust cache maintenance policy that allows us to over-come known challenges to a previously unsolved frequent pattern mining problem. Our contribution lies in the fact that we solve this problem for real-valued data, whereas only the discrete-valued case has been considered in the literature. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A