Data Science: Techniques and Applications

Click here for the 2022-23 edition

Time: Tuesday, 18:00 - 21:00 during weeks 2-11 of Summer Term.

Place: HyFlex
Labs MAL 404/405, and
MS Teams via Moodle (paywall): DSTA.

Module Coordinator: Alessandro Provetti

Teaching Assistants: Abul Hasan, Paschalis Lagias and Alberto Matuozzo.

Contents, resources and study materials:
the calendar below is shown as a general overview of the module.
Presentations, their order and the study materials are constantly reviewed, updated and amended.
The study materials may become final only at the end of the module. For a preview of the study programme, please see the shaded part below.


How to read the programme table
White background for regular lectures with slides and notes-taking.
Light-blue background for online lab experiences.
Grey background for work in progress or extra reference material (not examined).
Gold background for in-class assessments.
Date Unit Where Presentation (by revealjs) Resources PDF (by decktape or by revealjs)
Apr. 16 Week 0 (no class)
Apr. 23 Week 1 (no class)
April 30 2.a Class Class presentation Quarto PDF
2.b Class Data Science as 9 problems Quarto
From Provost-Fawcett's textbook:
  1. PF-ch. 2: Excerpts
PDF
2.c Class Math Concepts for Data Science Quarto
From Goodfellow et al. textbook:
  1. GBC-Ch. 2: Excerpts
PDF
NEW Lab Relevant Python modules:
  1. Numpy
  2. Pandas
Quarto Quarto for
  1. Numpy
  2. Pandas
Jupyter notebook for
  1. Numpy
  2. Pandas
PDF:
  1. Numpy
  2. Pandas
May 7 3.a Class Spectral Methods Quarto PDF
3.b Class Information Entropy Quarto for
  1. lecture
  2. pen-and-paper exercise
  3. for reference only, an advanced lecture on divergence
PDF
3.c Class Classification: The Iris Dataset Quarto
For reference: Excerpts from Zaki-Meira textbook.
PDF
3.d Lab 2D visualisation Quarto
  1. Download a Seaborn notebook
PDF
May 14 4.a Class Eigenpairs Quarto
From Leskovec et al. textbook (MMDS):
  1. MMDS-Ch. 11 Excerpts, part A
PDF
4.b Class The Gini index Quarto PDF
4.c Class Decision trees Quarto
  1. FP-ch.3: Predictive Modelling
PDF
4.d Lab Introduction: the k-NN algorithm
Classification with Scikit-learn
  1. baseline notebook
    Click here to see it on Colab
    A solution notebook is also available from the repo.
  2. k-NN Quarto
k-NN PDF
The lab presentation is in remarkjs format
Extra Non-binary classification

Evaluating Classification Performance
May 21 5.a Class High-dimensional data Quarto PDF
5.b Text as data Quarto PDF
5.c Lab Live coding experience: implementing Decision trees This lab experience will be conducted on Colab
Click here to see it on Colab
  1. local datafile
    baseline notebook
    solution notebook
PDF
5.d Lab Computing Eigenvalues and Eigenvectors Quarto PDF
May 28 New! Online In-class quiz
6.a Class Singular-value Decomposition Quarto PDF
6.b Natural Language Processing with Entropy Quarto PDF
6.c Class Introduction to Network models: Food Webs Quarto
From Caldarelli-Chessa textbook (CC):
  1. CC-Ch.1: Food webs Excerpts
PDF
Jun 4 7.a Class Latent dimensions
  1. MMDS Ch. 11 excerpt, part B
  2. Code of the SVD example
    Click here to see it on Colab
  3. Watch the original video presentation from YouTube or from the textbook website
PDF
7.b Class Rating and ranking: Massey's ranking Quarto
From Langville-Meyer's textbook (LM):
  1. LM-ch.2: Massey's method
PDF
7.c Class Trade Networks Quarto

From Caldarelli-Chessa textbook (CC):
  1. Ch.2: Trade Networks Excerpts
PDF
7.d Lab The Food Web notebook Quarto
This lab experience will be conducted on Colab
Click here to see it on Colab
  1. A data collection on food webs;
  2. the lab notebook, and
  3. its worked out solution.
PDF
Jun 11 8.a Class Non-negative Matrix Factorization Quarto
  1. A simple notebook
  2. tutorial on the direct implementation above, a slightly extended version is here.

For reference:
the Nature article;
the NIPS article, and
an IEEE Computer review article which explains applications in recommender systems.
PDF
8.b Class Rating, ranking: Keener Quarto
  1. LM-ch.4: Keener's method
PDF
8.c Class The Internet network Quarto
  1. CC-Ch. 3: The Internet Excerpts
PDF
8.d Lab The Trade networks notebook. This lab experience will be conducted on Colab
Click here to see it on Colab
  1. a data collection on 2003 International trade data;
  2. the lab notebook, and
  3. its worked-out solution.
June 18 9.a Factorization Machines Quarto
For reference:
  1. The [Rendle, ICDM 2010] article.
  2. The [Rendle, TIST 2012] article.
PDF
9.b Lab Ratings & ranking: the Premier League case Click here to see it on Colab
  1. data
  2. Local image of the exercise notebook
  3. A solution notebook
PDF
9.c Class Self-organised networks: WWW, Wikipedia etc. Quarto
  1. CC-Ch. 4: WWW, Wikipedia etc. Excerpts
PDF
9.d Lab The Internet notebook Click here to see it on Colab
  1. data
  2. Download the Internet notebook exercise, and
  3. the solution notebook.
PDF
The WWW, Wikipedia and OSNs notebook Click here to see it on Colab
  1. data
  2. Download the WWW, Wikipedia exercise, and
  3. the solution notebook.
PDF
Jun 25 10.a Class Rating, ranking: Markov Chains Quarto
  1. LM-ch.6: Markov's method
PDF
10.b Lab Markov chains in action Click here to see it on Colab
  1. Download the data.
  2. Local image of the exercise notebook
  3. The solution notebook
PDF
10.c Class Financial Networks Quarto
  1. CC-Ch. 5: Financial Networks excerpt
PDF
10.d Lab The Financial networks notebook Click here to see it on Colab
  1. The local data for the Financial notebook.
  2. demonstration notebook.
    The notebook requires the Yahoo Finance (yf) module.
    Download lab instructions.
PDF
Jul 2 New! Online Final in-class test
Final in-class test
New Free discussion

Presentations here have been produced using Revealjs (v. 5) or Remark.
To print Revealjs presentations or to save them locally as PDF files please follow their instructions or install and run decktape on your computer.
Mathematical formulae are rendered online by MathJax. Hence, some security settings of your browser might need tuning.

A note on learning support from the department.

Quarto Powered by Reveal.js Powered by MathJax Powered by Remark