Inizio della pagina -
Visita la Versione ad elevata leggibilità
Vai al Contenuto della pagina
Vai alla Fine dei contenuti
Vai al Menu Principale
Vai alla Barra di navigazione (sei in)
Vai al Menu di navigazione (albero)
Vai alla Lista dei comandi
Vai alla Lista degli approfondimenti
Vai al Menu inferiore
Logo Ateneo
Models, Methods and Algorithms for Semi and Unstructured Data Mining
Fabio Stella
Data e luogo

ma 14 luglio 09:00-12:00 - Sala Seminari DISCo
me 15 luglio 09:00-12:00 - Sala Seminari DISCo
gio 16 luglio 09:00-12:00 - Sala Seminari DISCo
ma 21 luglio 09:00-12:00 - Sala Seminari DISCo
me 22 luglio 09:00-12:00 - Sala Seminari DISCo
gio 23 luglio 09:00-12:00 - Sala Seminari DISCo
ma 28 luglio 09:00-12:00 - Sala Seminari DISCo


Motivazioni e obiettivi
Typical Data Mining applications use structured information that is carefully prepared. The data may be transformed by a data preparation process or, better yet, the data may be collected based on careful prior design for mining. The items that will be used are clearly described over a range of all possibilities, and they are then recorded uniformly for every example that is a member of the sample. The recipe is well known. Two types of information are expected: (a) ordered numerical and (b) categorical. Ordered numerical attributes have values where greater than or less than comparisons have meaning. For example, weight and income are obviously ordered. Categorical attributes are unordered numerical codes that have a definition in a codebook. The most common categorical attribute is something that can be measured as true or false, represented by a one or a zero. For example, gender can be measured as male or female, or business category can be measured by a code. The meaning of the code is described elsewhere, not to be used by the learning program but by the individuals interpreting the results of learning.

To process semi and unstructured data we first need to process it into a form that Data Mining procedures can be used. One of the main applications of semi and/or unstructured Data Mining is surely Text Mining. Text Mining also known as intelligent text analysis, text data mining, unstructured data management, or Knowledge Discovery in Text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge (usually converted to metadata elements) from unstructured text (i.e. free text) stored in electronic form. This can be achieved either through added markup in XML, Atom or RDF formats or though the analysis of common phraseologies indicating certain relationships. The course aims to provide a state-of–the-art of the semi and unstructured Data Mining topic. Application domains include, but are not limited to, Text Mining, WEB Mining, Bioinformatics and Finance.

The course is organized through the following main parts:

Part I: Text Mining and Text Mining Tasks.
Part II: From text to numerical vectors.
Part III: The learning methodology.
Part IV: Optimization Theory.
Part V: Support Vector Machines.
Part VI: Topic Extraction.
Part VII: Finding structure in documents.
Part VIII: Looking for information in docs.
Modalità di svolgimento
Frontal lessons
Modalità d’esame
Classroom assignments. End of course project assignment
Materiale didattico
Joachims T. (2002). Learning to classify text using vector support machines, Kluwer Academic Publishers. Loton T. (2002). Web content mining with Java, Wiley. Manning C.D. And Schutze H. (1999). Foundations of statistical natural language processing, The MIT Press. Weiss, S.M., Indurkhya, N., Zhang, T. And Damerau, F.J. (2005). Text Mining: predictive methods for analyzing unstructured information, Springer.

Google Translate
Translate to English Translate to French Translate to German Translate to Spanish Translate to Chinese Translate to Portuguese Translate to Arabic
Translate to Albanian Translate to Bulgarian Translate to Croatian Translate to Czech Translate to Danish Translate to Dutch Translate to Finnish Translate to Greek Translate to Hindi
Translate to Hungarian Translate to Irish Translate to Japanese Translate to Korean Translate to Norwegian Translate to Polish Translate to Romanian Translate to Russian Translate to Serbian
Translate to Slovenian Translate to Swedish Translate to Thai Translate to Turkish

(C) Copyright 2016 - Dipartimento Informatica Sistemistica e Comunicazione - Viale Sarca, 336
20126 Milano - Edificio U14 - ultimo aggiornamento di questa pagina 02/05/2012