Models, Methods and Algorithms for Semi and Unstructured Data Mining
Data e luogo
ma 14 luglio 09:00-12:00 - Sala Seminari DISCo
me 15 luglio 09:00-12:00 - Sala Seminari DISCo
gio 16 luglio 09:00-12:00 - Sala Seminari DISCo
ma 21 luglio 09:00-12:00 - Sala Seminari DISCo
me 22 luglio 09:00-12:00 - Sala Seminari DISCo
gio 23 luglio 09:00-12:00 - Sala Seminari DISCo
ma 28 luglio 09:00-12:00 - Sala Seminari DISCo
Typical Data Mining applications use structured information that is carefully prepared. The data may be transformed by a data preparation process or, better yet, the data may be collected based on careful prior design for mining. The items that will be used are clearly described over a range of all possibilities, and they are then recorded uniformly for every example that is a member of the sample. The recipe is well known. Two types of information are expected: (a) ordered numerical and (b) categorical. Ordered numerical attributes have values where greater than or less than comparisons have meaning. For example, weight and income are obviously ordered. Categorical attributes are unordered numerical codes that have a definition in a codebook. The most common categorical attribute is something that can be measured as true or false, represented by a one or a zero. For example, gender can be measured as male or female, or business category can be measured by a code. The meaning of the code is described elsewhere, not to be used by the learning program but by the individuals interpreting the results of learning.
To process semi and unstructured data we first need to process it into a form that Data Mining procedures can be used. One of the main applications of semi and/or unstructured Data Mining is surely Text Mining. Text Mining also known as intelligent text analysis, text data mining, unstructured data management, or Knowledge Discovery in Text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge (usually converted to metadata elements) from unstructured text (i.e. free text) stored in electronic form. This can be achieved either through added markup in XML, Atom or RDF formats or though the analysis of common phraseologies indicating certain relationships. The course aims to provide a state-of–the-art of the semi and unstructured Data Mining topic. Application domains include, but are not limited to, Text Mining, WEB Mining, Bioinformatics and Finance.
The course is organized through the following main parts:Part I: Text Mining and Text Mining Tasks.
Part II: From text to numerical vectors.
Part III: The learning methodology.
Part IV: Optimization Theory.
Part V: Support Vector Machines.
Part VI: Topic Extraction.
Part VII: Finding structure in documents.
Part VIII: Looking for information in docs.
Modalità di svolgimento
Classroom assignments. End of course project assignment
Joachims T. (2002). Learning to classify text using vector support machines, Kluwer Academic Publishers. Loton T. (2002). Web content mining with Java, Wiley. Manning C.D. And Schutze H. (1999). Foundations of statistical natural language processing, The MIT Press. Weiss, S.M., Indurkhya, N., Zhang, T. And Damerau, F.J. (2005). Text Mining: predictive methods for analyzing unstructured information, Springer.
|(C) Copyright 2016 - Dipartimento Informatica Sistemistica e Comunicazione
Viale Sarca, 336|
20126 Milano - Edificio U14
firstname.lastname@example.org - ultimo aggiornamento di questa pagina 02/05/2012