The success of the Semantic Web depends on the availability of ontologies as well as of web pages annotated with metadata conforming to these ontologies. Acquiring the necessary metadata through manual definition of an information extraction system is a laborious task requiring a lot of time and expert know-how. PANKOW (Pattern-based Annotation through Knowledge on the Web) , represents an automated self annotating Web method based on counting Go...
more
The success of the Semantic Web depends on the availability of ontologies as well as of web pages annotated with metadata conforming to these ontologies. Acquiring the necessary metadata through manual definition of an information extraction system is a laborious task requiring a lot of time and expert know-how. PANKOW (Pattern-based Annotation through Knowledge on the Web) , represents an automated self annotating Web method based on counting Google hits of instantiated linguistic patterns. It employs an unsupervised learning approach to characterize instances with regard to ontology, by combining the idea of using linguistic patterns to identify ontological relations as well as the idea of using the Web as a big corpus to overcome data sparseness.
The system scans the Web pages for phrases in the HTML text that might be categorized as instances of the ontology. Candidate phrases are proper nouns, identified by a standard part-of-speech tagging procedure. All candidate proper nouns...
less