The search for genomic information has just begun. New genomes are sequenced daily, and each brings new challenges and knowledge to the scientific table that must be carefully mined and studied to glean out every possible bit of information. The amount of data created during genomic sequencing is simply too great for researchers to handle, creating a need for computational tools capable of processing the genomic input and analyzing it for information. The area of bioinformatics focuses on this combination of computer science and biology, bringing useful software applications to the table in an effort to ease the workload of biologists.
One specific area of interest to biological researchers is the study of DNA words or motifs as they relate to gene regulation. These regulatory elements may be transcription factor binding sites (TFBS), which bind RNA polymerase II to the DNA strand, or enhancer/silencer sequences that up- and down-regulate transcription of the gene to which they are related by binding specific proteins. Many tools such as Weeder , WordSpy and YMF  are currently available for the study of over- and under-represented words in a DNA sequence, a trait which is believed to useful in identification of these regulatory elements. These tools all perform similar tasks by enumerating all words, or substrings, found in their input, then scoring and ranking these resulting words for presentation to the user. Optionally, many tools also cluster groups of words together to form degenerate motifs which allow for evolutionary and environmental variation in the binding site.
The Open Word Enumeration Framework (OWEF), presented in this thesis, providesa new framework on which DNA word enumeration tools can be built. The OWEF framework provides a set of abstract base classes representing the core stages of a word enumeration tool and defines a set of standard interfaces for each stage, allowing multiple algorithmic implementations of these base classes to co-exist and be selected individually at runtime.
In addition to providing a level of abstraction that allows for simpler development, the framework also provides a scalable solution to alleviate memory bottlenecks. The framework contains skeleton code for both a shared memory implementation, providing fast analysis on single-node, multiprocessor systems, and a distributed memory solution, which splits the tasks among several networked nodes to provide a large amount of accessible main memory to the application.
In summary, the OWEF framework is useful as a development tool by providing a set of interfaces and methods to allow developers to focus on specific aspects of the algorithms they are designing, while also providing a standardized, flexible interface to researchers, eliminating the need for specialized tools and providing a general-purpose toolkit for DNA word enumeration tasks.