Charles X. Ling, Professor
Department of Computer Science
The University of
I will first give a quick and comprehensive review of basic data mining tasks (and algorithms), including:
- Classification (decision trees, neural networks, etc.)
- Regression (regression trees, k-NN, etc.)
- Clustering (k-means, etc.)
- Association (Apriori)
Then I will teach some recent, advanced topics in data mining and machine learning, including:
- Support Vector Machines (SVM)
- Semi-supervised learning (co-training, EM, etc.)
- Ensemble learning (bagging, boosting)
- Cost-sensitive learning
- Active learning
- Bayesian learning
- Feature selection and extraction
I will also discuss many real-world applications of data mining:
- Mining market and stock data
- Direct marketing
- Credit risk prediction
- Action mining
- Mining medical data for medical diagnosis
- Profit mining
- Text mining
- Mining for search engines
- Social network discovery
The lecture will be delivered in a mixture of English and Chinese. Students are advised to attend all classes and follow the lecture notes closely. Active class participation and discussions are highly encouraged.
On the last day of the lecture, a discussion salon with all students will be held, and future research collaboration will be encouraged.
Graduate students who take this course for credit must complete a hand-on project, and write an exam at the end.
Basic knowledge on data mining, machine learning, and/or Artificial Intelligence (4th-year undergraduate or Masters' level) would help to understand the lecture well.
There are NO particular textbooks for this course because it is an advanced course. The majority of the course materials will come from research papers, PPT, and relevant reference books.
Data Mining: Concepts and Techniques (2nd edition). By Jiawei Han and Micheline Kamber. Morgan Kaufmann. 2006.
Machine Learning, by Tom Mitchell, McGraw Hill, 1997.
The WEKA Package (see http://www.cs.waikato.ac.nz/ml/weka/) is the most popular and powerful data mining tool used by machine learning researchers and data-mining practitioners around the world. Freely downloadable with open source code in Java. This is also the accompanying software for the book “Data Mining: Practical Machine Learning Tools and Techniques”.
Articles from Journal of Machine Learning Research, Machine Learning, KDD, IEEE TKDE, and so on may be used in the class.
Papers from KDD, IEEE ICDM, ICML, ECML, PAKDD, PKDD, and so on may be used in the lectures.
Useful website on data mining: http://www.kdnuggets.com/