Detección de plagio de documentos. Sistema externo monolingüe de altas prestaciones basado en n-gramas contextuales

  1. Rodríguez Torrejón, Diego Antonio
  2. Martín Ramos, José Manuel
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2010

Issue: 45

Pages: 49-58

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

In this paper a new approach is shown for a monolingual extrinsic plagiarism detection system based on a modification of the "n-gram" concept (named “contextual n-gram”), a new high performance Information Retrieval engine based on this new concept, and a new strategy (“referential monotonity”) for plagiarism detection and its limits. The assessment results can be compared with those results carried out by the winner team in PAN'09, but these are achieved with very low computational cost (results available between 30 and 45 minutes on a single laptop machine and without using concurrent programming) compared with the other existing works. Because of that, it is a very interesting proposal to exploit.

Bibliographic References

  • Barrón-Cedeño A. 2008. Detección automática ́ de plagio en texto. Tesis de Máster - Universidad de Valencia.
  • Barrón-Cedeño A. y Rosso P. 2009a. On Automatic Plagiarism Detection based on n-grams Comparison. Proc. European Conference on Information Retrieval, ECIR-2009,Springer-Verlag, LNCS (5478) páginas 696-700.
  • Barrón-Cedeño, A. y Rosso P. 2009b. On the Relevance of Search Space Reduction in Automatic Plagiarism Detection. Procesamiento del Lenguaje Natural, 43:141-149.
  • Clough P. 2003. Measuring Text Reuse. PhD Thesis - University of Sheffield.
  • Grozea, C., Gehl C. y Popescu M. N. 2009. ENCOPLOT pairwise sequence matching linear time plagiarism detection (PAN'09 papers).
  • Potthast M., Barrón-Cedeño A., Stein B., Rosso P. 2010 (en prensa). Cross-Language Plagiarism Detection. Languages Resources and Evaluation (Special Issue on Plagiarism and Authorship Analysis). DOI: 10.1007/s10579- 009-9114-z.
  • Potthast M., Stein A., Eiselt A., Barrón-Cedeño A., Rosso P. 2009. Overview of the 1st International Competition on Plagiarism Detection. En:
  • Stein B., Rosso P., Stamatatos E., Koppel M., and Agirre E., editors. SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 09), pp. 1-9, Donostia-San Sebastian, Spain, September 2009. CEUR-WS.org. ISSN 163-0073.