Mejorando las técnicas de verificación de wrappers web mediante técnicas bioinspiradas y de clasificación

  1. Fernández de Viana González, Iñaki Josep
Supervised by:
  1. Pedro José Abad Herrera Director
  2. José Luis Arjona Fernández Director
  3. José Luis Álvarez Macías Director

Defence university: Universidad de Huelva

Fecha de defensa: 22 January 2016

Committee:
  1. Rafael Corchuelo Gil Chair
  2. Javier Aroba Páez Secretary
  3. M. I. García Arenas Committee member
Department:
  1. TECNOLOGIAS DE LA INFORMACION

Type: Thesis

Abstract

Many Enterprise Applications require wrappers to deal with information from the deep web. Wrappers are automated systems that allow you to navigate, extract, reveal structures and verify information from the web. One of its elements, the information extractor, is formed by extraction rules series that are usually based on HTML tags. Therefore, if you change sources, the wrapper, in some cases, may return unwanted information by the company and cause, at the best, delays in their decision-making process. Some wrappers verification systems have been developed to automatically detect when a wrapper is taking out incorrect data. These systems have a number of shortcomings whose origin lies in assuming that the data to verify follow a series of pre statistics. This dissertation analyzes these systems, a framework is designed to develop verifiers and the verification problem is approached from two different points of view. Initially, we place it within the branch of computational optimization and solve it applying bio-inspired metaheuristic as it is found in ant colonies, specifically we will apply the BWAS algorithm. Subsequently we will formulate and solve as if it were a unsupervised classification problem. The result of this second approach is MAVE, a multilevel verifier whose main base are the unique class classifiers.