Negation and speculation detection in medical and review texts

Cruz Díaz, Noa Patricia

Negation and speculation detection in medical and review texts

Cruz Díaz, Noa Patricia

Dirigée par:

Manuel Jesús Maña López Directeur

Université de défendre: Universidad de Huelva

Fecha de defensa: 10 juillet 2014

Jury:

Manuel de Buenaga Rodríguez President
Jacinto Mata Vázquez Secrétaire
Mariana Lara Neves Rapporteur

Département:

TECNOLOGIAS DE LA INFORMACION

Type: Thèses

Teseo: 395167 DIALNET Arias Montano editor

Résumé

Negation and speculation detection has been an active research area during the last years in the Natural Language Processing community, including some Shared Tasks in relevant conferences. In fact, it constitutes a challenge in which many applications can benefit from identifying this kind of information (e.g., interaction detection, information extraction, sentiment analysis). This thesis aims to contribute to the ongoing research on negation and speculation in the Language Technology community through the development of machinelearning systems which determine the speculation and negation cues and resolve their scope (i.e., identify at sentence level which tokens are affected by the cues). It is focused on the two domains in which negation and hedging have drawn more attention: the biomedical and the review domains. In the first one, the proposed method improves the results to date for the sub-collection of clinical documents of the BioScope corpus. In the second, the novelty of the contribution lies in the fact that, to the best of our knowledge, this is the first system trained and tested on the SFU Review corpus annotated with negative and speculative information. At the same time, this is the first attempt to detect speculation in the review domain. Additionally, and due to the tokenization problems that were encountered during the preprocessing of the BioScope corpus and the small number of works in the bibliography which propose solutions for this problem, this thesis closely describes this issue and provide both a comprehensive overview analysis and evaluation of a set of tokenization tools. This means, the first comparative evaluation study of tokenizers in the biomedical domain which could help Natural Language Processing developers to choose the best tokenizer to use.