<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ru">
	<id>https://wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=Into_to_DataMining_and_Machine_Learning_2020_2021</id>
	<title>Into to DataMining and Machine Learning 2020 2021 - История изменений</title>
	<link rel="self" type="application/atom+xml" href="https://wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=Into_to_DataMining_and_Machine_Learning_2020_2021"/>
	<link rel="alternate" type="text/html" href="https://wikicshse.ru/index.php?title=Into_to_DataMining_and_Machine_Learning_2020_2021&amp;action=history"/>
	<updated>2026-06-06T13:28:42Z</updated>
	<subtitle>История изменений этой страницы в вики</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://wikicshse.ru/index.php?title=Into_to_DataMining_and_Machine_Learning_2020_2021&amp;diff=348&amp;oldid=prev</id>
		<title>imported&gt;Machine: /* Lecture on 09 February 2021 */</title>
		<link rel="alternate" type="text/html" href="https://wikicshse.ru/index.php?title=Into_to_DataMining_and_Machine_Learning_2020_2021&amp;diff=348&amp;oldid=prev"/>
		<updated>2021-06-23T07:45:29Z</updated>

		<summary type="html">&lt;p&gt;&lt;span class=&quot;autocomment&quot;&gt;Lecture on 09 February 2021&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Новая страница&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Course: Introduction to Data Mining and Machine Learning (2020–2021) ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Lecturer:&amp;#039;&amp;#039;&amp;#039; Dmitry Ignatov&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;TA:&amp;#039;&amp;#039;&amp;#039; Stefan Nikolić&lt;br /&gt;
&lt;br /&gt;
All the material are available via our t &lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Final mark formula&amp;#039;&amp;#039;&amp;#039;: FM = 0.8 Homeworks + 0.2 Exam. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Homeworks ===&lt;br /&gt;
&lt;br /&gt;
* Homework 1: Graph Spectral Clustering&lt;br /&gt;
* Homework 2: Classification or Frequent Itemset Mining or Clustering&lt;br /&gt;
* Homework 3: Recommender Systems&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 12 January 2021===&lt;br /&gt;
&lt;br /&gt;
Intro slides. Course plan. Assessment criteria. ML&amp;amp;DM libraries. What to read and watch?&lt;br /&gt;
&lt;br /&gt;
Practice: demonstration with Orange.&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 19 January 2021===&lt;br /&gt;
&lt;br /&gt;
Classification. One-rule. Naïve Bayes. kNN. Logistic Regression. Train-test split and cross-validation. Quality Metrics (TP, FP, TN, FN, Precision, Recall, F-measure, Accuracy).&lt;br /&gt;
&lt;br /&gt;
Practice: demonstration with Orange.&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 26 January 2021===&lt;br /&gt;
&lt;br /&gt;
Classification (continued). Quality metrics. ROC curves. &lt;br /&gt;
&lt;br /&gt;
Practice: demonstration with Orange.&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 2 February 2021===&lt;br /&gt;
&lt;br /&gt;
Introduction to Clustering. Taxonomy of clustering methods. K-means. K-medoids. Fuzzy C-means. Types of distance metrics. Hierarchical clustering. DBScan&lt;br /&gt;
&lt;br /&gt;
Practice: DBScan Demo.&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 09 February 2021===&lt;br /&gt;
&lt;br /&gt;
* Introduction to Clustering (continued). Density-based techniques. DBScan and Mean-shift.&lt;br /&gt;
&lt;br /&gt;
* Graph and spectral clustering. Min-cuts and normalized cuts. Laplacian matrix. Fiedler vector. Two-mode Spectral Clustering (Spectral Biclustering). Applications: Web Advertising, Community detection in Social Networks, Music Recommendations.&lt;br /&gt;
&lt;br /&gt;
=== Practice on 16 Feb 2021 ===&lt;br /&gt;
&lt;br /&gt;
Clustering with scikit-learn (k-means, hierarchical clustering, DBScan, MeanShift, Spectral Clustering).&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 2 March 2021 ===&lt;br /&gt;
&lt;br /&gt;
Practice: Spectral clustering. &lt;br /&gt;
&lt;br /&gt;
Lecture: Decision tree learning. ID3. Information Entropy. Information gain. Gini coefficient and index. Overfitting and pruning. Decision trees for numeric data. Oblivious decision trees. Regression trees.  &lt;br /&gt;
&lt;br /&gt;
=== Lecture on 9 March 2021 ===&lt;br /&gt;
&lt;br /&gt;
Frequent Itemsets. Association Rules. Algorithms: Apriori, FP-growth. Interestingness measures. Closed and maximal itemsets. &lt;br /&gt;
&lt;br /&gt;
=== Lecture + Practice on 16 March 2021 ===&lt;br /&gt;
&lt;br /&gt;
Frequent Itemset Mining (continued). Applications: 1) Taxonomies of Website Visitors and 2) Web advertising.&lt;br /&gt;
&lt;br /&gt;
Exercises. Frequent Itemsets. FP-growth. Closed itemsets.&lt;br /&gt;
&lt;br /&gt;
Practice. Orange, SPMF, Concept Explorer.&lt;br /&gt;
&lt;br /&gt;
=== Practice on 6 April 2021 ===&lt;br /&gt;
&lt;br /&gt;
Practice. Scikit-learn tutorial on kNN, Decision Trees, NaÏveBayes, Logistic Regression, SVM etc.&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 13 April 2021 ===&lt;br /&gt;
&lt;br /&gt;
Introduction to Recommender systems. Taxonomy of Recommender Systems (non-personalised, content-based, collaborative filtering, hybrid etc). Real Examples. User-based and item-based collaborative filtering. Bimodal cross-validation.&lt;br /&gt;
&lt;br /&gt;
=== Lecture + Practice on 25 April 2021 ===&lt;br /&gt;
&lt;br /&gt;
Practice: User-based and item-based collaborative filtering with Python and MovieLens.&lt;br /&gt;
&lt;br /&gt;
Case-study: Non-negative Matrix Factorisation, Boolean Matrix Factorisation vs. SVD in Collaborative Filtering. &lt;br /&gt;
&lt;br /&gt;
Lecture: Advanced factorisation models: PureSVD, SVD++, timeSVD, ALS.&lt;br /&gt;
&lt;br /&gt;
=== Lecture on 11 May 2021 ===&lt;br /&gt;
&lt;br /&gt;
* Advanced factorisation models: Factorisation Machines (continued). &lt;br /&gt;
* Supervised Ensemble Learning. Bias-Variance decomposition. Bagging. Random Forest. Boosting for classification (AdaBoost) and regression. Stacking and Blending. Recommendation of Classifiers. &lt;br /&gt;
&lt;br /&gt;
=== Practice plus Lecture on 18 May 2021 ===&lt;br /&gt;
&lt;br /&gt;
Practice: Bagging, Pasting, Random Projections, and Patching. Random Forest and Extra Trees. Gradient Boosting. Voting. &lt;br /&gt;
&lt;br /&gt;
Lecture on Gradient Boosting.&lt;br /&gt;
&lt;br /&gt;
=== Exam ===&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Date&amp;#039;&amp;#039;&amp;#039;: 29.06.2021. &amp;#039;&amp;#039;&amp;#039;Starting time&amp;#039;&amp;#039;&amp;#039;: 11:00. &amp;#039;&amp;#039;&amp;#039;Location&amp;#039;&amp;#039;&amp;#039;: remote exam (see the channel announcements).&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;&amp;#039;Format&amp;#039;&amp;#039;&amp;#039;: One-to-one meeting in Zoom with the lecturer or the course TA. On average, you will be given one theoretical quick question and one small finger exercise. &lt;br /&gt;
If you marks for HWs are satisfactory, participation in the exam is your choice but recommended.&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Questions&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
What is and how does it work questions based on the studied topics. &lt;br /&gt;
&lt;br /&gt;
# Taxonomy of DM and ML methods. &lt;br /&gt;
# Classification. One-rule and Decision Stumps. Decision Trees. ID3 algorithm. &lt;br /&gt;
# Classification. Naïve Bayes. Smoothing. &lt;br /&gt;
# Classification. KNN&lt;br /&gt;
# Classification. Logistic regression.&lt;br /&gt;
# Classification quality metrics. ROC and AUC. &lt;br /&gt;
# Clustering. k-means and k-medoids. Fuzzy c-means. &lt;br /&gt;
# Clustering. Hierarchical clustering.&lt;br /&gt;
# Clustering. DBScan and Mean-Shift.&lt;br /&gt;
# Clustering quality metrics. Silhouette. Elbow method. Cophenetic distance. Calinski and Harabasz score.&lt;br /&gt;
# Spectral Clustering. Laplacian graph transformation and min-cuts.&lt;br /&gt;
# Decision Trees. ID3. Information gain and Gini index. &lt;br /&gt;
# Ensemble Learning. Bias and variance decomposition. Overfitting. &lt;br /&gt;
# Ensemble Learning. Bagging.&lt;br /&gt;
# Ensemble Learning. Boosting. AdaBoost.&lt;br /&gt;
# Ensemble Learning. Random Forest.&lt;br /&gt;
# Ensemble Learning. Gradient Boosting.&lt;br /&gt;
# Data Mining. Frequent Itemset Mining and Association Rules. Interestinngess Measures. Closed and Maximal Itemsets. &lt;br /&gt;
# Data Mining. Frequent Itemset Mining and Association Rules. Apriori vs. FP-growth.&lt;br /&gt;
# Recommender Systems. Collaborative Filtering. Item-based and user-based techniques. Quality metrics and bimodal cross-validation.&lt;br /&gt;
# Recommender Systems. NMF, Boolean Matrix Factorisation and SVD for Collaborative Filtering.&lt;br /&gt;
# Recommender Systems. Advances in matrix factorisation: PureSVD, SVD++, timeSVD, ALS, Factorisation Machines.&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Small tasks&amp;#039;&amp;#039;&amp;#039;. &lt;br /&gt;
&lt;br /&gt;
Examples of finger exercises with pen and pencil. &lt;br /&gt;
&lt;br /&gt;
# Given a small dataset 5 x 4, find its most informative attributes based on Information Gain and Gini Index.  &lt;br /&gt;
# Given a toy set of transactions, find no less than three association rules with a given support and confidence. &lt;br /&gt;
# Given a tiny user-item table, find the top three recommendations for a given user by user-based and item-based approaches. &lt;br /&gt;
# Given a little matrix of user-item interactions, find its product into Boolean matrices of preferably smaller second dimensions.&lt;/div&gt;</summary>
		<author><name>imported&gt;Machine</name></author>
	</entry>
</feed>