
Bioinformatics
Phylogenomics
All proteins have evolved from other proteins, and their evolutionary relationships form an extensive family tree of proteins. Proteins that are more closely related on this tree are more likely to have similar structures and functions than more distantly related proteins. Phylogenetic methods initially developed to study the “tree of life” are now applied to genomic data; this approach is called phylogenomics. In the few years that phylogenomics has been used for protein structure prediction it has proven to be capable of improving the accuracy of prediction. However, current phylogenomic methods use only a small fraction of the information that is relevant to protein structure prediction. Next-generation methods will be based on models of the actual process of protein evolution. We will use these models to assess the likelihood that an unknown protein has the same function as a known family of proteins. Visualization of models of protein evolution and their predictions will be used to guide the development of models that are sensitive to the most relevant parameters. Interactive exploration of this model-structure space will be used to select the best overall model parameterization as well as to identify parts of the structure that are most strongly characteristic of a particular protein family. These visualizations will use the unique virtual reality capabilities of LITE.
The LITE Center in UL Lafayette's Research Park

Object Databases
The growth of scientific knowledge has been accompanied by an increase in the complexity of its constituent data. At present, the potential for integration of knowledge from different sources is limited by our capacity to assimilate and synthesize. Relational databases have enhanced the collection and accessibility of knowledge in some areas, but they are not well suited for integration of the complex, heterogeneous data of biology. Object databases, which directly map the attributes and behavior of real-world entities to program objects provide an alternative that we are exploring. The very language used to describe object databases suggests a correspondence with biology. Heterogeneous objects within a class are polymorphic, and new classes of objects can be derived from old by inheritance. During the 1990's relational databases, which are well suited to business applications, came to dominate commercial markets and interest in object databases declined. However the rise of XML as a lingua franca for data exchange has provided a new rationale for object databases, which can more easily match the structure of XML documents.
Population Genetics Database
At present there are no comprehensive databases for population genetics. Unfortunately, much data is being lost because complete population genetic data sets are seldom published in the scientific literature (Leberg and Neigel, 1999). This is especially tragic when it becomes impossible to repeat population genetic studies that would have been valuable for conservation or comparative purposes.
In collaboration with P. Leberg and with funding from the NSF, we developed a prototype population genetic database for animal mitochondrial DNA. This is an object database, which provides common abstractions of such entities as genotypes, individuals, locations, and populations as well as standard measures of genetic diversity and divergence. This built-in functionality allows meaningful comparisons to be made among heterogeneous types of data, such as sequence data and RFLP data.
We have begun preliminary work on a second-generation XML database for population genetics that will take the form of a suite of integrated tools and services rather than a single database management system.