Inizio della pagina -
Visita la Versione ad elevata leggibilità
Vai al Contenuto della pagina
Vai alla Fine dei contenuti
Vai al Menu Principale
Vai alla Barra di navigazione (sei in)
Vai al Menu di navigazione (albero)
Vai alla Lista dei comandi
Vai alla Lista degli approfondimenti
Vai al Menu inferiore
Logo Ateneo
Massive data analysis in the Web and post-genomic era

Prof.ssa Bonizzoni

Data e luogo

April 16th Introduction Part I at 10.30 a.m.
Paola Bonizzoni (Università di Milano-Bicocca)

April 16th Inferring Genetic Diversity from NGS at 15.30 p.m.
Beernwinkel Niko (ETH Zurich)

April 17th Inferring Genetic Diversity from NGS at 09.30 a.m.
Beernwinkel Niko (ETH Zurich)

April 23th Introduction Part II at 13:00 p.m Sala Seminari
Gianluca Della Vedova (Università di Milano-Bicocca)

April 24th Introduction Part II at 14:00 p.m. Sala Riunione (1° floor)
Gianluca Della Vedova (Università di Milano-Bicocca)

May 2 The Paradigm of Data Stream for Next Generation Internet at 14:00 p.m. Sala Seminari
Irene Finocchi (Università la Sapienza - Roma)

May 3 The Paradigm of Data Stream for Next Generation Internet at 10 a.m. Sala Seminari
Irene Finocchi (Università la Sapienza - Roma)

May 8thNext Generation Sequencing analysis at 9.30 a.m. Sala Riunioni
Nadia Pisanti (Università di Pisa)

Motivazioni e Obiettivi

“The future of computing is not just big iron (computers). It’s big data.” Tom Kalil, deputy director of the White House Office of Science and Technology Policy.

“Big data refers to the rising flood of digital data from many sources, including the Web, biological and industrial sensors, video, e-mail and social network communications.”

“…On Thursday, the National Science Foundation will announce a joint program with the National Institutes of Health seeking new techniques and technologies for data management, data analysis and machine learning, which is a branch of artificial intelligence….”

Tratto da New York Times (marzo 2012)


The course is oriented to PhD students who want to learn about the state-of-the-art of data analysis methodologies in this fast developing area. Next generation sequencing data allows the study of biological processes at an unprecedented level of detail. However, transforming such genomic data into valuable biological information is not an easy task. Moreover, the huge size of data poses new computational challenges.

The course consists of lectures given by experts in the field of massive data analysis, including data streaming techniques for Next Generation Internet and Web data. The course will provide a general overview of recent techniques for analyzing and managing massive data in various research frameworks.


Introduction (Part I and II)

The first lectures aim to provide the theoretical bases required to face the research topics introduced in  the course, as well as the main technological motivations of big data. The course is oriented to computer scientists, physicists, statisticians, genetic epidemiologists, bioinformaticians,  genome biologists and aims to open a discussion on the challenges and opportunities in next-generation sequencing data analysis and massive data analysis.

Part I:  Massive data, deep sequencing and indexing techniques. Software tools.

Part II:  Moore's Law: current trends and the big data revolution. Approaches to work splitting: parallel algorithms, map reduce, data streaming.

Inferring genetic diversity from Next Generation sequencing

Niko Beerenwinkel, ETH Zurich,

Computational Biology Group (Svizzera)

Genetic diversity is a hallmark of evolution and it plays a key role in the pathogenesis and treatment of rapidly evolving pathogens, such as viruses, bacteria, and cancer cells.

With high-coverage next-generation sequencing (NGS), the genetic diversity of mixed samples can be probed at an unprecedented level of detail in a cost-effective manner. However, NGS reads tend to be erroneous and they are relatively short, complicating the detection of low-frequency variants and the reconstruction of long haplotype sequences. In this lecture, I will introduce computational and statistical challenges associated witgenetic diversity estimation from NGS data. I will discuss several approaches to their solution based on probabilistic graphical models and on combinatorial optimization techniques. Two major applications will be presented: the genetic diversity of HIV within patients and the genetic diversity of cancer cells within tumors.

Part 1: Detecting low-frequency single-nucleotide variants (SNVs)

Part 2: Local haplotype inference and global quasispecies assembly

The Paradigm of Data Stream for Next Generation Internet

Irene Finocchi, Università la Sapienza,Roma

Data stream algorithmics has gained increasing popularity in the lastfew years as an effective paradigm for processing massive data sets. A wide range of applications in computational sciences generate huge and rapidly changing streams of data that need to be continuously monitored and processed in one or few sequential passes, using a limited amount of working memory. Despite the heavy restrictions on time and space resources imposed by this data access model, major progress has been achieved in the last ten years in the design of streaming algorithms for several fundamental data sketching and statistics problems. The lectures will overview this rapidly evolving area and present basic algorithmic ideas, techniques, and challenges in data stream processing.

Next Generation sequencingdata analysis

Nadia Pisanti, Università di Pisa

New Sequencing Technologies have dramatically decreased costs and thus opened the way to new challenges in applications such as metagenomics and transcriptome analysis by means of sequences; in particular, low costs of re-sequencing applied to the human genome opens the way to new issued in personalised medicine. As a consequence, a new phase has been opened for genome research. From the point of view of the computer scientist, the management of huge amount of data, the small size of sequenced fragments (with respect to previous technologies), and the new applications that bring down on sequences lots of data that used to be managed with arrays, has led to several new problems in string algorithms. We will try to give an overview on them and on possible approaches to address these problems.


Short Biography


Niko Beerenwinkel, ETH Zurich,

Niko Beerenwinkel was born in Düsseldorf, Germany. He studied mathematics, biology, and computer science, and received his Diploma degree in Mathematics from the University of Bonn in 1999 and his PhD in Computer Science from Saarland University in 2004. He was a postdoctoral researcher at the University of California at Berkeley (2004-2006) and at Harvard University (2006-2007) before joining ETH Zurich as assistant professor of computational biology.

Niko Beerenwinkel's research is at the interface of mathematics, statistics, and computer science with biology and medicine. His interests range from mathematical foundations of biostatistical models to clinical applications. Current research topics include haplotype inference from ultra-deep sequencing data, somatic evolution of cancer, reconstruction of signaling pathways from RNAi screens, HIV drug resistance, graphical models, and algebraic statistics.

He has authored over 50 research articles in the areas of computational biology, bioinformatics, biostatistics, virology, and cancer biology. His honors include the Otto Hahn Medal of the Max Planck Society and the Emmy Noether Fellowship of the German National Science Foundation.


Irene Finocchi, Università la Sapienza, Roma

Irene Finocchi obtained a PhD in Computer Science (2002) from SapienzaUniversity of Rome, where is currently Associate Professor at theDepartment of Computer Science. Her research interests include thedesign, theoretical analysis, and experimental evaluation ofalgorithms and data structures, focusing on algorithmics for massivedata sets, algorithms resilient to memory faults, and algorithmengineering. More recently, she has been exploring the application ofalgorithmic theory for data-intensive scenarios to the design andimplementation of dynamic program analysis tools. Irene Finocchi hasbeen PC co-chair of ALENEX'09, the 11th SIAM Workshop on AlgorithmEngineering and Experiments, and has served on the program committeesof many major conferences in the field of algorithmics including SODA(ACM-SIAM Symp. on Discrete Algorithms), ICALP (Int. Colloquium onAutomata, Languages & Programming),and ESA (European Symp. OnAlgorithms). She is recipient of a Distiguished Paper Award at OOPSLA2011, the 26th Annual ACM SIGPLAN Conference on Object-OrientedProgramming, Systems, Languages, and Applications.


Nadia Pisanti,Università di Pisa

Nadia Pisantiobtained  the DEA "Informatique Fondamentale et Applications" on the subject "Genome Analysis"  from the University of  Marne la Vallée, France, and a PhD in Computer Science in 2002 from the  University of Pisa.Her research interests include Bioinformaticsand algorithms for Computational Biology.  She has been a visiting researcherat the Instutute Pasteur in Paris, at INRIA Rhone Alpes, at the University of Haifa, LIACS in Leiden, the King's College of London and at the University of  Lion 1. Since 2006 she isResearch Assistant at the Department of Computer Science in Pisa. She served in theProgram Committees of many major conferences in Bioinformatics,  including ICCABS, WABI  and RECOMB.

Modalità di svolgimento
Modalità d'esame


Materiale didattico

Il materiale didattico verrà distribuito durante il corso.

Nessun approfondimento presente per questa pagina

Google Translate
Translate to English Translate to French Translate to German Translate to Spanish Translate to Chinese Translate to Portuguese Translate to Arabic
Translate to Albanian Translate to Bulgarian Translate to Croatian Translate to Czech Translate to Danish Translate to Dutch Translate to Finnish Translate to Greek Translate to Hindi
Translate to Hungarian Translate to Irish Translate to Japanese Translate to Korean Translate to Norwegian Translate to Polish Translate to Romanian Translate to Russian Translate to Serbian
Translate to Slovenian Translate to Swedish Translate to Thai Translate to Turkish

(C) Copyright 2016 - Dipartimento Informatica Sistemistica e Comunicazione - Viale Sarca, 336
20126 Milano - Edificio U14 - ultimo aggiornamento di questa pagina 02/05/2012