Data science and big data processing in Rrepresentations and software

  1. Septem Riza, Lala
unter der Leitung von:
  1. Francisco Herrera Triguero Doktorvater/Doktormutter
  2. José Manuel Benítez Sánchez Doktorvater/Doktormutter

Universität der Verteidigung: Universidad de Granada

Fecha de defensa: 17 von Juli von 2015

Gericht:
  1. Antonio González Muñoz Präsident/in
  2. Manuel Gómez Olmedo Sekretär/in
  3. Matías Gámez Martínez Vocal
  4. Luciano Sánchez Ramos Vocal
  5. Antonio Peregrín Rubio Vocal

Art: Dissertation

Zusammenfassung

The main objective of this thesis is the development of high quality and easy to use software modules for represent, create and manage system models and data analysis. Since it has become a de facto standard, R is the platform of choice. The mentioned packages consider the techniques based on fuzzy systems, rough sets, and fuzzy rough sets. In addition, a universal representation framework for fuzzy rule-based systems is introduced. Finally, the implementation of random forests and random ferns for tackling Big Data is discussed. According to these objectives, the following are results of the research: 1. The "frbs" package: It is an R package implementing the most relevant types of fuzzy rule-based systems along with a selection of machine-learning algorithms to build them. The package focuses on classification and regression tasks. It also includes a mechanism to allow the construction of a model by human experts. It is available in CRAN: http://cran.r-project.org/package=frbs and in the project website: http://sci2s.ugr.es/dicits/software/FRBS. 2. The "RoughSets" package: It is an R package implementing algorithms based on rough set theory and fuzzy rough set theory for knowledge representation and data analysis. In includes tools for managing missing values, discretization, feature selection, and instance selection, for both classification and regression tasks. It is available in CRAN: http://cran.r-project.org/package=RoughSets and in the project website: http://sci2s.ugr.es/dicits/software/RoughSets. 3. frbsPMML: It is a universal representation framework for fuzzy rule based systems based on the Predictive Model Markup Language. Furthermore, two software libraries to manage the representation are implemented: an extension of the "frbs"package and the Java package "frbsJpmml". 4. The "SparkFernTreeR" package: It is an R package implementing random forests and random ferns for dealing with Big Data processing. This package is developed on top of the Big Data frameworks: Apache Hadoop and Apache Spark.