事例に基づくシリーズ型 html 文書の意味論理構造の自動認識: Html から xml への自動変換を目指して

Iwanuma Koji Umehara Masayuki

Download from

dx.doi.org

More download options

事例に基づくシリーズ型 html 文書の意味論理構造の自動認識: Html から xml への自動変換を目指して

Iwanuma Koji Umehara Masayuki

Transactions of the Japanese Society for Artificial Intelligence 17 (6):690-698 (2002) Copy BIBT_EX

Abstract

The recognition and extraction of semantic/logical structures in HTML documents are substantially important and difficult tasks for intelligent document processing. In this paper, we show that the alignment technology is an appropriate tool, within a framework of case-based reasoning, for recognizing semantic structures inherently embedded in a series of HTML documents. That is, given a series of HTML documents and a document example of which semantic structures are explicitly indicated by a user, then the alignment can identify semantic structures in the HTML document series, by matching a text-block sequence in each HTML document with the text-block sequence in the example document. Several important properties in text documents, such as continuity, sequentiality of texts, can be treated by the alignment in a quite natural way. The alignment technology can significantly improve the capability of the case-based transformation method which transforms a spatial and/or temporal series of HTML documents into machine-readable XML formats. Moreover, the alignment dramatically eases the construction of transformation exmaples. Throughout experimental evaluation for 47 pages of 8 series of HTML documents, we show that the case-based method using the alignment achieved a highly accurate transformation into XML formats.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Keywords

alignment, case-based transformation, semantic structure, HTML, XML

Reprint years

DOI

10.1527/tjsai.17.690

Other Versions

No versions found

My notes

Analytics

Added to PP
2014-03-24

Downloads
24 (#909,478)

6 months
6 (#858,075)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

事例に基づくシリーズ型 html 文書の意味論理構造の自動認識: Html から xml への自動変換を目指して

Abstract

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work