Welcome to The Workshop

Big Data applications are wide-reaching, with potential for innovations in many disciplines, such as environment, medicine, engineering, business, social sciences and humanities. To embrace the unprecedented opportunities and echo the grand challenges, Simon Fraser University and Zhejiang University, the two leading universities in Canada and China, respectively, are determined to build on their long time strategic collaboration and create the next synergy of in-depth and extensive collaboration in Big Data research and development. The First Joint Workshop on Big Data Research is a forum where researchers and students from the two universities will meet, exchange ideas, and explore opportunities.

Workshop Schedule

Thursday, June 25 Friday, June 26 Saturday, June 27
Thursday, June 25
18:00
Welcome Reception
18:00-20:00
Friday, June 26
08:45
Opening
08:45-09:15
09:15
Keynote (Wanli Min, Director of Data Science, Alibaba)
09:15-10:15
10:15
Coffee Break
10:30
Research Session: Networks and Systems
10:30-12:30
12:30
Lunch
12:30-14:00
14:00
15:30
Coffee Break
16:00
Breakout Sessions on Potential Research Collaborations
17:30
Banquet
17:30-20:00
Saturday, June 27
09:00
11:00
Coffee Break
11:30
Research Session: Multimedia
11:30-12:30
12:30
Lunch
12:30-14:00
14:00
Research Session: Data Mining and Management
14:00-15:30
15:30
Coffee Break
16:00
Business Meeting
How can SFU and ZJU strengthen the collaboration on Big Data research? (All are invited)
17:30
Farewell Dinner
17:30-20:00

Advance Schedule

Friday, June 26 Saturday, June 27
Friday, June 26
Research Session
Networks and Systems
10:30-12:30


In this talk, I will overview the recent research in my lab on big data computing, crowdsourcing, social networking, online gaming, structure health monitoring, and etc.
I will then present a systematic online prediction method (Social-Forecast) that is capable to accurately forecast the popularity of videos promoted by social media. Social-Forecast explicitly considers the dynamically changing and evolving propagation patterns of videos in social media when making popularity forecasts, thereby being situation and context aware. We analytically bound the prediction performance loss of Social-Forecast as compared to that obtained by an omniscient oracle and prove that the bound is sublinear in the number of video arrivals. We conduct extensive experiments using real-world data traces collected from the videos shared in RenRen, one of the largest online social networks in China. These experiments show that our proposed method outperforms existing view-based approaches for popularity prediction (which are not context-aware) by more than 30% in terms of prediction rewards.
Software defined networking has been a hot research topic for the networking community. Software design for sensor networks has many differences from the Internet. In this talk, we focus on the design of a software-defined architecture for wireless sensor networks. We present a novel taxonomy for Software Defined Sensor Networks (SDSN) according to different abstractions of functionalities. We examine the major challenges towards a generic and efficient SDSN architecture. We introduce some useful techniques which are either adopted in existing solutions or can be used to address part of the challenges. We will also introduce our recent work for software defined sensing and software update in sensor networks. We believe a well-designed SDSN architecture can greatly facilitate software development for deployed networks, allowing rapid technical innovations.
Recent advances in the multidisciplinary fields of brain-machine interfaces, artificial intelligence, computational neuroscience, microelectronics, and neurophysiology signal a growing convergence between machines and living beings. Brain-machine interfaces (BMIs) enable direct communication pathways between the brain and an external device, making it possible to connect organic and computing parts at the signal level. Cyborg means a biological-machine system consisting of both organic and computing components. Cyborg intelligence aims to deeply integrate machine intelligence with biological intelligence by connecting machines and living beings via BMIs, enhancing strengths and compensating for weaknesses by combining the biological cognition capability with the machine computational capability. This talk will introduce the concept, architectures, and applications of cyborg intelligence.
11:30-12:00
The Small Data Problem
Dr. Arrvindh Shriraman
Today, power constraints determine our ability to keep compute units active and busy. By 2022 we can only keep 1/5th of our chip active; the other 4/5th will be dark (or shut-off) unless we reduce the energy consumption. Interestingly, storing and moving the data used and produced by the computation consumes more energy than the computation itself. Whether multicores, GPUs or fixed-function accelerators, how we move and feed the computation units has critical impact on the programming model and the compute efficiency. We observe that unlike the latency overhead of the data movement which could potentially be hidden, energy overhead dictates that we need to fundamentally reduce waste in the memory hierarchy. I will provide an overview of the research in my group that adapts the data storage and movement to the application characteristics and improves energy efficiency.
Graduate Student Poster Session
14:00-15:30
Students from Zhejiang University
Students from Simon Fraser University
Breakout Sessions on Potential Research Collaborations
10:30-12:30


Networks and Systems
Coordinators:
Dr. Jiangchuan Liu (SFU)
Dr. Arrvindh Shriraman (SFU)
Dr. Wei Dong (ZJU)
Multimedia and Machine Learning
Coordinators:
Dr. Ze-Nian Li (SFU)
Dr. Gang Pan (ZJU)
Dr. Fei Wu (ZJU)
Data Mining and Management
Coordinators:
Dr. Martin Ester (SFU)
Dr. Jian Pei (SFU)
Dr. Xiaofei He (ZJU)
Dr. Yunjun Gao (ZJU)
Dr. Deng Cai (ZJU)
Saturday, June 27
Tutorial
09:00-11:00


Abstract:
With the emergence of online social networks, academia and industry have explored ways to exploit the information in social networks to improve the quality of recommendations and to support new recommendation tasks. The underlying motivation is to capture the effects that govern the evolution of social networks, i.e. social influence, selection, correlational influence and transitivity, to enhance the typically very sparse rating matrix. Recommender systems exploiting a social network promise to outperform traditional recommenders in particular for cold-start (new) users who have not yet provided enough information about their preferences. After introducing the motivation and some of the practical applications, we discuss social networks and the main factors affecting their evolution. We then review state-of-the-art methods for item recommendation in social networks, both memory-based approaches and model-based approaches, in particular matrix factorization. We discuss friend recommendation, an important recommendation task that is unique to the context of social networks. We conclude the tutorial with a discussion of future research directions such as explanation of recommendations and combination of machine- and human computation.
Short Bio:
Martin Ester is Professor of Computing Science at Simon Fraser University in Burnaby, British Columbia. He got his Diplom (M.Sc.) in Computer Science from University of Dortmund, Germany, in 1984 and his Ph.D. in Computer Science from ETH Zurich, Switzerland, in 1990. He is a co-director of SFU's Databases and Data Mining Laboratory. His research has been published in the top venues of his field such as KDD, WWW, and RecSys. He received the KDD 2014 Test of Time Award for his work on DBSCAN. Rcently, he served as PC Co-Chair of ASONAM 2014 and RecSys 2014
Research Session
Multimedia
11:30-12:30


Object detection and recognition is to localize visual objects and recognize their corresponding classes/identities and activities in images and videos. In the last decades, the amount of image and video data has been growing exponentially, and it poses great challenges to Computer Vision and Multimedia research. The studies on hierarchical structures in biological vision have long influenced computer vision. In spite of good progress, the general object detection and recognition problems remain unsolved. In recent years, many researchers have shifted their focus onto specific methods on specific tasks. Often, rather "flat" schemes are used that involve simple feature descriptors and task-dependent machine learning algorithms. Although some of these methods are very effective for their specific tasks, they inevitably have serious limitations, often only work well on their trained datasets. This talk will survey some of the research in this area and offer some insights of the possible solutions.
Cross-modal retrieval is a very hot research topic that is imperative to many applications involving multi-modal data. Discovering an appropriate representation (embedding) for multi-modal data is essential to boost the cross-media retrieval. This talk will first introduce multi-modal embedding in terms of the latent structures (i.e., topics) of multi-modal data via mutual topic reinforce modeling. Then the utilization of list-wise ranking based method for multi-modal embedding will be described. In the end, this talk will focus on the multi-modal embedding in the manner of deep learning.
Research Session
Data Mining and Management
14:00-15:30


Recently, the hashing techniques have been widely applied to approximate the nearest neighbor search problem in many real applications. The basic idea of these approaches is to generate binary codes for data points which can preserve the similarity between any two of them. Given a query, instead of performing a linear scan of the entire data base, the hashing method can perform a linear scan of the points whose hamming distance to the query is not greater than rh, where rh is a constant. However, in order to find the true nearest neighbors, both the locating time and the linear scan time are proportional to O(rhi=0 c i) (c is the code length), which increase exponentially as rh increases. To address this limitation, we propose a novel algorithm named iterative expanding hashing in this paper, which builds an auxiliary index based on an offline constructed nearest neighbor table to avoid large rh. This auxiliary index can be easily combined with all the traditional hashing methods. Extensive experimental results over various real large-scale datasets demonstrate the superiority of the proposed approach.
14:30-15:00
Big Data for Everyone
Dr. Jian Pei
Big Data post grand opportunities and challenges for egocentric analytics on Big Data. In this talk, I will discuss several interesting problems centered on egocentric queries and analysis on Big Data. We want to answer a series of natural questions imperative in several killer applications, such as "How is this patient similar to or different from the other Type II diabetes patients in the database?", "How is University X distinct from the other universities?", and "How is this residential property distinct from the others available in the market?" To answer such questions on Big Data, we have to search data of high dimensionality and high volume, and possibly of high dynamics as well. I will present some preliminary research results and some application case studies we obtained recently, as well as more challenges we identified.
Spatial queries, including similarity search, similarity joins, and aggregate k nearest neighbors queries, are useful in many areas, such as multimedia retrieval, data integration, and computational biology, to name but a few. However, they are not yet supported well by commercial DBMS. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this talk, we propose an efficient disk-based metric access method, the Space-filling curve and Pivot-based B+-tree (SPB-tree), to accelerate query processing and support a wide range of data types and similarity metrics. The SPB-tree uses a small set of so-called pivots to reduce significantly the number of distance computations, utilizes a space-filling curve to cluster the data into compact regions, thus improving storage efficiency, and employs a B+-tree with minimum bounding box information as the underlying index. The SPB-tree also utilizes a separate random access file to efficiently manage a large and complex data. By design, it is easy to integrate the SPB-tree into an existing DBMS. In this talk, we also present efficient similarity search, similarity join, and aggregate k nearest neighbor query algorithms and corresponding cost models based on the SPB-tree. In addition, we develop a distributed geo-textual image retrieval and recommendation system (I2RS), which employs SPB-trees to index geo-textual images, and utilizes metric similarity queries, including time-aware range and k nearest neighbor queries, to provide a variety of geo-textual image retrieval and recommendation services.