Senior Honors Thesis

Topic

My thesis is on the topic of ontology alignment algorithms.

What is an ontology?
An ontology is a set of concepts and the relationships between them. They are used to model domains, often on the Semantic Web, such as the weather or biological domain. Ontologies are represented by a set of elements and the relations between them, including concepts (or "classes"), properties, and individuals.
Precipitation ontology This is a graphical representation of a portion of the SWEET 2.0 Precipitation ontology. Each box represents a concept, and the text inside of it is its label. The arrows connecting the concepts define their relationships. Ex: the concept "Snow" is a "Precipitation" means that the "Snow" concept is a subset of the "Precipitation" concept, or that "Snow" is a child of "Precipitation."

What is ontology alignment?
Ontology alignment is the process of determining a set of correlations between the concepts of two ontologies. Currently, the process is semi-automatic, with algorithms determining the correlations and humans accepting or rejecting the assertions.

The purpose of ontology alignment is to develop high-level modeling across systems. It is useful for database integration and for fulfilling the connected vision of the Semantic Web.

Abstract

Aligning multiple ontologies has come to the forefront of the data integration field as a critical and complex problem without a single universally applicable solution. Many unique semi-automatic and automatic solutions have been proposed, each solving a piece of the alignment problem and leading to a more complete picture of how to best integrate the information of the Semantic Web. We present a context-sensitive ontology alignment algorithm that takes on one aspect of the problem: the problem of accurately identifying composite matches. These matches, which arise when one concept is equivalent to the combination of multiple others, are an often-overlooked portion of an accurate alignment. Our algorithm identifies these matches, along with the typical one-to-one matches, by taking advantage of the information provided by a concept’s surrounding concepts, referred to as the “context” of a concept. Because the algorithm looks more broadly at concepts and the information that their relationships confer, it can identify both composite and non-composite matches. In the end, this methodology provides more accurate matches between ontologies that differ structurally. The process of identifying all matches begins with linguistic matching to determine a preliminary set of possible matches. Then, it uses the contextual information, namely parent nodes, for the remainder of the process to filter the possible matches. It finishes with post-processing to identify the composite matches and to determine which of all of the matches are fully confident, undetermined, and not confident.

Kelly Moran  |  kelly.moran@tufts.edu