Mining Uncertain and Probabilistic Data for Big Data Analytics

Uncertain data is inherent in many important applications, particularly in the context of big data analytics, such as environmental surveillance, healthcare informatics, customer-relationship management, market analysis, and quantitative economics research. It is almost impossible to avoid modeling and analyzing uncertainty and probability in conquering big data. Analyzing and mining large collections of uncertain data have become an important task and attracted more and more interest from the data mining and industry application communities.

In this tutorial, carrying big data analytics as the grand background, we will present a systematic yet compact review on mining uncertain and probabilistic data, including motivations and application examples, problems, challenges, fundamental principles, state-of-the-art methods, the interesting open problems and future directions. We will emphasize big data analytics applications, connections among various mining and analytics tasks, fundamental principles, and open problems.

We assume that the audience has the basic concepts of probability and statistics. However, no deep background knowledge about statistics, sampling, probability, or any other mathematical principles is assumed. We will use sufficient examples to explain the ideas and the intuitions.

Tutorial slides and other supplementary materials will be made available here once they are ready.