事例に基づくシリーズ型 html 文書の意味論理構造の自動認識: Html から xml への自動変換を目指して

Transactions of the Japanese Society for Artificial Intelligence 17 (6):690-698 (2002)
  Copy   BIBTEX

Abstract

The recognition and extraction of semantic/logical structures in HTML documents are substantially important and difficult tasks for intelligent document processing. In this paper, we show that the alignment technology is an appropriate tool, within a framework of case-based reasoning, for recognizing semantic structures inherently embedded in a series of HTML documents. That is, given a series of HTML documents and a document example of which semantic structures are explicitly indicated by a user, then the alignment can identify semantic structures in the HTML document series, by matching a text-block sequence in each HTML document with the text-block sequence in the example document. Several important properties in text documents, such as continuity, sequentiality of texts, can be treated by the alignment in a quite natural way. The alignment technology can significantly improve the capability of the case-based transformation method which transforms a spatial and/or temporal series of HTML documents into machine-readable XML formats. Moreover, the alignment dramatically eases the construction of transformation exmaples. Throughout experimental evaluation for 47 pages of 8 series of HTML documents, we show that the case-based method using the alignment achieved a highly accurate transformation into XML formats.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 100,937

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

論文からのプレゼンテーション資料の作成支援.武市 雅司 安村 禎明 - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18 (4):212-220.
ITS for CSS and HTML.Mariam Elawar & Bastami Bashhar - 2017 - International Journal of Academic Research and Development 2 (1):94-99.
CSS-Tutor: An Intelligent Tutoring System for CSS and HTML.Mariam W. Alawar & Samy S. Abu Naser - 2017 - International Journal of Academic Research and Development 2 (1):94-99.
Xml 表現に基づく cbr を用いた日常業務支援システム.Suzuki Sachiko Yasumura Yoshiaki - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18 (4):183-192.
文書群に対する物語構造の動的分解・再構成フレームワーク.赤石 美奈 - 2006 - Transactions of the Japanese Society for Artificial Intelligence 21 (5):428-438.

Analytics

Added to PP
2014-03-24

Downloads
24 (#909,478)

6 months
6 (#858,075)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references