The First SFU-ZJU Joint Workshop on Big Data

Welcome to The Workshop

Big Data applications are wide-reaching, with potential for innovations in many disciplines, such as environment, medicine, engineering, business, social sciences and humanities. To embrace the unprecedented opportunities and echo the grand challenges, Simon Fraser University and Zhejiang University, the two leading universities in Canada and China, respectively, are determined to build on their long time strategic collaboration and create the next synergy of in-depth and extensive collaboration in Big Data research and development. The First Joint Workshop on Big Data Research is a forum where researchers and students from the two universities will meet, exchange ideas, and explore opportunities.

Workshop Schedule

Thursday, June 25 Friday, June 26 Saturday, June 27

Thursday, June 25


18:00	Welcome Reception 18:00-20:00

Friday, June 26


08:45	Opening 08:45-09:15
09:15	Keynote (Wanli Min, Director of Data Science, Alibaba) 09:15-10:15
10:15	Coffee Break
10:30	Research Session: Networks and Systems 10:30-12:30
12:30	Lunch 12:30-14:00
14:00	Graduate Student Poster Session 14:00-15:30
15:30	Coffee Break
16:00	Breakout Sessions on Potential Research Collaborations
	Session 1 Networks and Systems	Session 2 Multimedia and Machine Learning	Session 3 Data Mining and Management
17:30	Banquet 17:30-20:00

Saturday, June 27


09:00	Tutorial: Data Mining and Recommendation in Social Networks 09:00-11:00
11:00	Coffee Break
11:30	Research Session: Multimedia 11:30-12:30
12:30	Lunch 12:30-14:00
14:00	Research Session: Data Mining and Management 14:00-15:30
15:30	Coffee Break
16:00	Business Meeting How can SFU and ZJU strengthen the collaboration on Big Data research? (All are invited)
17:30	Farewell Dinner 17:30-20:00

Advance Schedule

Friday, June 26 Saturday, June 27

Friday, June 26

Research Session

Networks and Systems

10:30-12:30

10:30-11:00
Timely Video Popularity Forecasting in Social Networks
Dr. Jiangchuan Liu

In this talk, I will overview the recent research in my lab on big data computing, crowdsourcing, social networking, online gaming, structure health monitoring, and etc.
I will then present a systematic online prediction method (Social-Forecast) that is capable to accurately forecast the popularity of videos promoted by social media. Social-Forecast explicitly considers the dynamically changing and evolving propagation patterns of videos in social media when making popularity forecasts, thereby being situation and context aware. We analytically bound the prediction performance loss of Social-Forecast as compared to that obtained by an omniscient oracle and prove that the bound is sublinear in the number of video arrivals. We conduct extensive experiments using real-world data traces collected from the videos shared in RenRen, one of the largest online social networks in China. These experiments show that our proposed method outperforms existing view-based approaches for popularity prediction (which are not context-aware) by more than 30% in terms of prediction rewards.

11:00-11:30
Towards a Software-Defined Architecture for Wireless Sensor Networks
Dr. Wei Dong

Software defined networking has been a hot research topic for the networking community. Software design for sensor networks has many differences from the Internet. In this talk, we focus on the design of a software-defined architecture for wireless sensor networks. We present a novel taxonomy for Software Defined Sensor Networks (SDSN) according to different abstractions of functionalities. We examine the major challenges towards a generic and efficient SDSN architecture. We introduce some useful techniques which are either adopted in existing solutions or can be used to address part of the challenges. We will also introduce our recent work for software defined sensing and software update in sensor networks. We believe a well-designed SDSN architecture can greatly facilitate software development for deployed networks, allowing rapid technical innovations.

11:30-12:00
Cyborg Intelligence: Towards the Convergence of Machine and Biological Intelligence
Dr. Gang Pan

Recent advances in the multidisciplinary fields of brain-machine interfaces, artificial intelligence, computational neuroscience, microelectronics, and neurophysiology signal a growing convergence between machines and living beings. Brain-machine interfaces (BMIs) enable direct communication pathways between the brain and an external device, making it possible to connect organic and computing parts at the signal level. Cyborg means a biological-machine system consisting of both organic and computing components. Cyborg intelligence aims to deeply integrate machine intelligence with biological intelligence by connecting machines and living beings via BMIs, enhancing strengths and compensating for weaknesses by combining the biological cognition capability with the machine computational capability. This talk will introduce the concept, architectures, and applications of cyborg intelligence.

11:30-12:00
The Small Data Problem
Dr. Arrvindh Shriraman

Today, power constraints determine our ability to keep compute units active and busy. By 2022 we can only keep 1/5th of our chip active; the other 4/5th will be dark (or shut-off) unless we reduce the energy consumption. Interestingly, storing and moving the data used and produced by the computation consumes more energy than the computation itself. Whether multicores, GPUs or fixed-function accelerators, how we move and feed the computation units has critical impact on the programming model and the compute efficiency. We observe that unlike the latency overhead of the data movement which could potentially be hidden, energy overhead dictates that we need to fundamentally reduce waste in the memory hierarchy. I will provide an overview of the research in my group that adapts the data storage and movement to the application characteristics and improves energy efficiency.

Graduate Student Poster Session

14:00-15:30

Students from Zhejiang University

Efficient Metric Indexing for Similarity Search
Lu Chen

I2RS: Distributed Geo-textual Image Retrieval and Recommendation System
Zhihao Xing

Cross-Modal Learning to Rank via Latent Joint Representation
Xinyang Jiang

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling
Hanqi Wang

ZiXOR: Improving ZigBee Performance Under Wi-Fi Interference Leveraging Corruption Burstiness
Zhiwei Zhao

Mosaic: A Low-Cost Mobile Sensing System for Urban Air Quality Monitoring
Gonglong Chen

Sparse Principal Component Analysis via Rotation and Truncation
Zhenfang Hu

Mining User Attributes Using Large-scale APP Lists of Smartphones
Shao Zhao

Students from Simon Fraser University

DASX: Hardware Accelerators for Collecting Software Data structures
Snehasish Kumar

Recommending Groups to Users Using User-Group Engagement and Time-Dependent Matrix Factorization
Xin Wang

Visual Cue-guided Rat Cyborg for Automatic Navigation
Minlong Lu

Utilizing Cloud to Empower Crowdsourced BigData Applications
Di Fu

Parallel Field Alignment for Cross Media Retrieval
Xiangbo Mao

A Unified Framework for Influence Maxmization
Yu Yang

Breakout Sessions on Potential Research Collaborations

10:30-12:30

Networks and Systems
Coordinators:
Dr. Jiangchuan Liu (SFU)
Dr. Arrvindh Shriraman (SFU)
Dr. Wei Dong (ZJU)

Multimedia and Machine Learning
Coordinators:
Dr. Ze-Nian Li (SFU)
Dr. Gang Pan (ZJU)
Dr. Fei Wu (ZJU)

Data Mining and Management
Coordinators:
Dr. Martin Ester (SFU)
Dr. Jian Pei (SFU)
Dr. Xiaofei He (ZJU)
Dr. Yunjun Gao (ZJU)
Dr. Deng Cai (ZJU)

Saturday, June 27

Tutorial

09:00-11:00

09:00-11:00
Data Mining and Recommendation in Social Networks
Dr. Martin Ester

Abstract:
With the emergence of online social networks, academia and industry have explored ways to exploit the information in social networks to improve the quality of recommendations and to support new recommendation tasks. The underlying motivation is to capture the effects that govern the evolution of social networks, i.e. social influence, selection, correlational influence and transitivity, to enhance the typically very sparse rating matrix. Recommender systems exploiting a social network promise to outperform traditional recommenders in particular for cold-start (new) users who have not yet provided enough information about their preferences. After introducing the motivation and some of the practical applications, we discuss social networks and the main factors affecting their evolution. We then review state-of-the-art methods for item recommendation in social networks, both memory-based approaches and model-based approaches, in particular matrix factorization. We discuss friend recommendation, an important recommendation task that is unique to the context of social networks. We conclude the tutorial with a discussion of future research directions such as explanation of recommendations and combination of machine- and human computation.

Short Bio:
Martin Ester is Professor of Computing Science at Simon Fraser University in Burnaby, British Columbia. He got his Diplom (M.Sc.) in Computer Science from University of Dortmund, Germany, in 1984 and his Ph.D. in Computer Science from ETH Zurich, Switzerland, in 1990. He is a co-director of SFU's Databases and Data Mining Laboratory. His research has been published in the top venues of his field such as KDD, WWW, and RecSys. He received the KDD 2014 Test of Time Award for his work on DBSCAN. Rcently, he served as PC Co-Chair of ASONAM 2014 and RecSys 2014

Research Session

Multimedia

11:30-12:30

11:30-12:00
Hierarchical Structures for Object Detection and Recognition
Dr. Ze-Nian Li

Object detection and recognition is to localize visual objects and recognize their corresponding classes/identities and activities in images and videos. In the last decades, the amount of image and video data has been growing exponentially, and it poses great challenges to Computer Vision and Multimedia research. The studies on hierarchical structures in biological vision have long influenced computer vision. In spite of good progress, the general object detection and recognition problems remain unsolved. In recent years, many researchers have shifted their focus onto specific methods on specific tasks. Often, rather "flat" schemes are used that involve simple feature descriptors and task-dependent machine learning algorithms. Although some of these methods are very effective for their specific tasks, they inevitably have serious limitations, often only work well on their trained datasets. This talk will survey some of the research in this area and offer some insights of the possible solutions.

12:00-12:30
Deep embedding of multi-modal data for cross-media retrieval
Dr. Fei Wu

Cross-modal retrieval is a very hot research topic that is imperative to many applications involving multi-modal data. Discovering an appropriate representation (embedding) for multi-modal data is essential to boost the cross-media retrieval. This talk will first introduce multi-modal embedding in terms of the latent structures (i.e., topics) of multi-modal data via mutual topic reinforce modeling. Then the utilization of list-wise ranking based method for multi-modal embedding will be described. In the end, this talk will focus on the multi-modal embedding in the manner of deep learning.

Research Session

Data Mining and Management

14:00-15:30

14:00-14:30
Hashing for Large Scale Nearest Neighbor Search
Dr. Deng Cai

Recently, the hashing techniques have been widely applied to approximate the nearest neighbor search problem in many real applications. The basic idea of these approaches is to generate binary codes for data points which can preserve the similarity between any two of them. Given a query, instead of performing a linear scan of the entire data base, the hashing method can perform a linear scan of the points whose hamming distance to the query is not greater than rh, where rh is a constant. However, in order to find the true nearest neighbors, both the locating time and the linear scan time are proportional to O(rhi=0 c i) (c is the code length), which increase exponentially as rh increases. To address this limitation, we propose a novel algorithm named iterative expanding hashing in this paper, which builds an auxiliary index based on an offline constructed nearest neighbor table to avoid large rh. This auxiliary index can be easily combined with all the traditional hashing methods. Extensive experimental results over various real large-scale datasets demonstrate the superiority of the proposed approach.

14:30-15:00
Big Data for Everyone
Dr. Jian Pei

Big Data post grand opportunities and challenges for egocentric analytics on Big Data. In this talk, I will discuss several interesting problems centered on egocentric queries and analysis on Big Data. We want to answer a series of natural questions imperative in several killer applications, such as "How is this patient similar to or different from the other Type II diabetes patients in the database?", "How is University X distinct from the other universities?", and "How is this residential property distinct from the others available in the market?" To answer such questions on Big Data, we have to search data of high dimensionality and high volume, and possibly of high dynamics as well. I will present some preliminary research results and some application case studies we obtained recently, as well as more challenges we identified.

15:00-15:30
Efficient Indexing for Querying Metric Spaces
Dr. Yunjun Gao

Spatial queries, including similarity search, similarity joins, and aggregate k nearest neighbors queries, are useful in many areas, such as multimedia retrieval, data integration, and computational biology, to name but a few. However, they are not yet supported well by commercial DBMS. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this talk, we propose an efficient disk-based metric access method, the Space-filling curve and Pivot-based B+-tree (SPB-tree), to accelerate query processing and support a wide range of data types and similarity metrics. The SPB-tree uses a small set of so-called pivots to reduce significantly the number of distance computations, utilizes a space-filling curve to cluster the data into compact regions, thus improving storage efficiency, and employs a B+-tree with minimum bounding box information as the underlying index. The SPB-tree also utilizes a separate random access file to efficiently manage a large and complex data. By design, it is easy to integrate the SPB-tree into an existing DBMS. In this talk, we also present efficient similarity search, similarity join, and aggregate k nearest neighbor query algorithms and corresponding cost models based on the SPB-tree. In addition, we develop a distributed geo-textual image retrieval and recommendation system (I2RS), which employs SPB-trees to index geo-textual images, and utilizes metric similarity queries, including time-aware range and k nearest neighbor queries, to provide a variety of geo-textual image retrieval and recommendation services.

The First SFU-ZJU Joint Workshop on Big Data

Welcome to The Workshop

Workshop Schedule

Networks and Systems

Multimedia and Machine Learning

Data Mining and Management

Advance Schedule

10:30-11:00 Timely Video Popularity Forecasting in Social Networks Dr. Jiangchuan Liu

11:00-11:30 Towards a Software-Defined Architecture for Wireless Sensor Networks Dr. Wei Dong

11:30-12:00 Cyborg Intelligence: Towards the Convergence of Machine and Biological Intelligence Dr. Gang Pan

11:30-12:00 The Small Data Problem Dr. Arrvindh Shriraman

Efficient Metric Indexing for Similarity Search Lu Chen

I2RS: Distributed Geo-textual Image Retrieval and Recommendation System Zhihao Xing

Cross-Modal Learning to Rank via Latent Joint Representation Xinyang Jiang

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling Hanqi Wang

ZiXOR: Improving ZigBee Performance Under Wi-Fi Interference Leveraging Corruption Burstiness Zhiwei Zhao

Mosaic: A Low-Cost Mobile Sensing System for Urban Air Quality Monitoring Gonglong Chen

Sparse Principal Component Analysis via Rotation and Truncation Zhenfang Hu

Mining User Attributes Using Large-scale APP Lists of Smartphones Shao Zhao

DASX: Hardware Accelerators for Collecting Software Data structures Snehasish Kumar

Recommending Groups to Users Using User-Group Engagement and Time-Dependent Matrix Factorization Xin Wang

Visual Cue-guided Rat Cyborg for Automatic Navigation Minlong Lu

Utilizing Cloud to Empower Crowdsourced BigData Applications Di Fu

Parallel Field Alignment for Cross Media Retrieval Xiangbo Mao

A Unified Framework for Influence Maxmization Yu Yang

Networks and Systems Coordinators: Dr. Jiangchuan Liu (SFU) Dr. Arrvindh Shriraman (SFU) Dr. Wei Dong (ZJU)

Multimedia and Machine Learning Coordinators: Dr. Ze-Nian Li (SFU) Dr. Gang Pan (ZJU) Dr. Fei Wu (ZJU)

Data Mining and Management Coordinators: Dr. Martin Ester (SFU) Dr. Jian Pei (SFU) Dr. Xiaofei He (ZJU) Dr. Yunjun Gao (ZJU) Dr. Deng Cai (ZJU)

09:00-11:00 Data Mining and Recommendation in Social Networks Dr. Martin Ester

11:30-12:00 Hierarchical Structures for Object Detection and Recognition Dr. Ze-Nian Li

12:00-12:30 Deep embedding of multi-modal data for cross-media retrieval Dr. Fei Wu

14:00-14:30 Hashing for Large Scale Nearest Neighbor Search Dr. Deng Cai

14:30-15:00 Big Data for Everyone Dr. Jian Pei

15:00-15:30 Efficient Indexing for Querying Metric Spaces Dr. Yunjun Gao

10:30-11:00
Timely Video Popularity Forecasting in Social Networks
Dr. Jiangchuan Liu

11:00-11:30
Towards a Software-Defined Architecture for Wireless Sensor Networks
Dr. Wei Dong

11:30-12:00
Cyborg Intelligence: Towards the Convergence of Machine and Biological Intelligence
Dr. Gang Pan

11:30-12:00
The Small Data Problem
Dr. Arrvindh Shriraman

Efficient Metric Indexing for Similarity Search
Lu Chen

I2RS: Distributed Geo-textual Image Retrieval and Recommendation System
Zhihao Xing

Cross-Modal Learning to Rank via Latent Joint Representation
Xinyang Jiang

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling
Hanqi Wang

ZiXOR: Improving ZigBee Performance Under Wi-Fi Interference Leveraging Corruption Burstiness
Zhiwei Zhao

Mosaic: A Low-Cost Mobile Sensing System for Urban Air Quality Monitoring
Gonglong Chen

Sparse Principal Component Analysis via Rotation and Truncation
Zhenfang Hu

Mining User Attributes Using Large-scale APP Lists of Smartphones
Shao Zhao

DASX: Hardware Accelerators for Collecting Software Data structures
Snehasish Kumar

Recommending Groups to Users Using User-Group Engagement and Time-Dependent Matrix Factorization
Xin Wang

Visual Cue-guided Rat Cyborg for Automatic Navigation
Minlong Lu

Utilizing Cloud to Empower Crowdsourced BigData Applications
Di Fu

Parallel Field Alignment for Cross Media Retrieval
Xiangbo Mao

A Unified Framework for Influence Maxmization
Yu Yang

Networks and Systems
Coordinators:
Dr. Jiangchuan Liu (SFU)
Dr. Arrvindh Shriraman (SFU)
Dr. Wei Dong (ZJU)

Multimedia and Machine Learning
Coordinators:
Dr. Ze-Nian Li (SFU)
Dr. Gang Pan (ZJU)
Dr. Fei Wu (ZJU)

Data Mining and Management
Coordinators:
Dr. Martin Ester (SFU)
Dr. Jian Pei (SFU)
Dr. Xiaofei He (ZJU)
Dr. Yunjun Gao (ZJU)
Dr. Deng Cai (ZJU)

09:00-11:00
Data Mining and Recommendation in Social Networks
Dr. Martin Ester

11:30-12:00
Hierarchical Structures for Object Detection and Recognition
Dr. Ze-Nian Li

12:00-12:30
Deep embedding of multi-modal data for cross-media retrieval
Dr. Fei Wu

14:00-14:30
Hashing for Large Scale Nearest Neighbor Search
Dr. Deng Cai

14:30-15:00
Big Data for Everyone
Dr. Jian Pei

15:00-15:30
Efficient Indexing for Querying Metric Spaces
Dr. Yunjun Gao