Jiannan Wang

Jiannan Wang

Associate Professor
School of Computing Science
Simon Fraser University

Postdoc in the AMPLab at UC Berkeley (2015)
Ph.D. at THU (2013), B.Sc. at HIT (2008)

Research Areas: Database Systems, Big Data Science

Office: TASC 1 9237
Phone: 1-778-782-4288
Email: jnwang@sfu.ca

8888 University Drive
Burnaby, BC

Open Positions: If you are an SFU undergraduate student and would like to work with me on DataPrep, please email me your resume and transcript with a short statement of research interests.

My lab is part of the SFU Data Science Research Group. Our mission is to speed up data science. We develop innovative technologies and open-source tools for data scientists such that they can turn raw data into actionable insights in a more efficient manner. I recently won an IEEE TCDE Rising Star Award (2018) and a CS-Can|Info-Can Outstanding Early Career Researcher Award (2020).

I am a General Co-chair for VLDB 2023, an Associate Editor for VLDB 2021, and an Associate Editor of Data Science of Frontiers in Big Data (2021 - now). I am the director of the SFU CS Professional Master's Program (2019 - now). Our vision is to train the next generation of technical global leaders in strategically important areas of society. I am also a founding editor-in-chief of the SFU Big Data Science Publication, which has now become one of the most popular Data Science publications on Medium


I am teaching CMPT 733: Big Data Programming and CMPT 354: Database Systems I for Spring Semester 2022.
We are super excited to announce the release of ConnectorX 0.2 (a subproject in dataprep.ai). ConnectorX is the fastest library to load data from DB to DataFrames in Rust and Python. It can accelerate Pandas read_sql by 10x with one line of code. Since its first release, the library has been downloaded by ~12000 times. Please check out our blog post and benchmark results.
Want to know how to debug an ML model in federated learning? Please check out our recent paper, entitled "Enabling SQL-based Training Data Debugging for Federated Learning", in VLDB 2022!
I was invited to give two talks in the Thomson Reuters AI@TR Invited Speaker Series on our AutoML-EM and DataPrep projects.
Congratulations to Brandon Lockhart on his successful M.Sc. thesis defense! In his dissertation, he studies "Explaining Inference Queries with Bayesian Optimization". Learn more about Brandon and his research from this page.
Congratulations to Dr. Pei Wang on her successful Ph.D. thesis defense! Her thesis is about "Automating Data Preparation with Statistical Analysis", which covers her work on AutoML-EM@ICDE 2021, ActiveDeeper Demo@VLDB 2020, Deeper@SIGMOD 2019, Uni-Detect@SIGMOD 2019, and Deeper Demo@SIGMOD 2018.
Want to know whether we are ready to deploy learned cardinality models in production database systems? Please check out our recent paper in VLDB 2021!
Our DataPrep.EDA paper got accepted by SIGMOD 2021!
I am serving as a General Co-chair for VLDB 2023 @ Vancouver.
I was invited to give a talk at Databricks to introduce DataPrep: The easiest way to prepare data in Python.
Congratulations to Xiaoying Wang on her successful M.Sc. thesis defense! In her dissertation she studies "Are We Ready For Learned Cardinality Estimation?".
A new paper on "Automating Entity Matching Model Development" got accepted by ICDE 2021!
My Ph.D. students (Pei Wang and Weiyuan Wu) gave a talk at PyData Global 2020, the world's premier data science conference (video).
We are pleased to announce the launch of DataPrep's brand new website: http://dataprep.ai!
Receive the "Distinguished PVLDB Review Board Member" award!
Welcome Danrui Qi (Ph.D.) to join the lab! Congratulate her on receiving the prestigious Graduate Dean's Entrance Scholarship (GDES).
Promoted to Associate Professor (Tenure)!
We wrote two blog posts to describe i) how to use DataPrep.EDA to accelerate EDA and ii) why DataPrep.EDA is better than Pandas-profiling.
We are super excited to announce the release of DataPrep 0.2. DataPrep wants to become "scikit-learn" for data preparation. Since its first release, the library has been downloaded by ~4000 times. This release contains a data connector component to facilitate web data collection and an exploratory data analysis component to enable fast data understanding. More components will be added in future releases.
Want to know how to debug training data for SQL-ML queries. Please check out our recent paper, entitled "Complaint-driven Training Data Debugging for Query 2.0", in SIGMOD 2020!
Want to know how to detect data errors for machine learning applications? Please check out our recent paper, entitled "SCODED: Statistical Constraint Oriented Data Error Detection", in SIGMOD 2020!
I am honored to be invited to serve as an Associate Editor for VLDB 2021.

Recent Publications [DBLP] [Google Scholar]



Current Students and Postdocs

Former Students

  • Pei Wang, Ph.D., 2016-2021 (Next Huawei Canada Senior Engineer)
  • Lydia Zheng, M.Sc., 2019-2021 (Next Workday Software Application Developer)
  • Yi Xie, M.Sc., 2019-2021 (Next Amazon Web Services Software Engineer)
  • Brandon Lockhart, M.Sc., 2019-2021 (Next Zafin Data Engineer)
  • Yejia Liu, Visiting Student, 2020-2021 (Next UC Riverside Ph.D. Student)
  • Qingcan Li, Undergrad RA, 2019-2020 (Next Google Software Engineer)
  • Ruochen Jiang, M.Sc., 2018-2019 (Next OSU Ph.D. Student)
  • Young Wu, M.Sc., 2017-2019 (Next SFU Ph.D. Student)
  • Xi Yang, Undergrad RA, 2018-2019 (Next Alibaba Software Engineer)
  • Liang Zhao, Postdoc 2018-2019 (Next Assistant Professor at Xi'an Jiaotong University)
  • Shubham Laddha, Visiting Student 2019
  • Danrui Qi, Visiting Student 2019
  • Song Bian, Visiting Student 2019 (Next CUHK M.Sc. Student)
  • Xiao Li, Undergrad RA, 2019
  • Mohamad Dolatshah, M.Sc. 2016-2018
  • Mathew Teoh, Undergrad RA 2016-2017 (Next Data Scientist at Quora)
  • Yongjun He, Visiting Student 2017 (Next SFU M.Sc. Student)
  • Nathan Yan, Visiting Student 2017 (Next Cornell Ph.D. Student)





  • VLDB Best Experiments, Analysis & Benchmark Paper Award (2021)
  • CS-Can|Info-Can Outstanding Early Career Researcher Award (2020)
  • PVLDB Distinguished Review Board Member Award (2020)
  • IEEE TCDE Rising Star Award (2018)
  • IEEE/WIC/ACM Web Intelligence (WI) Best Paper Award (2017)
  • ACM SIGMOD Best Demonstration Award (2016)
  • China Computer Federation (CCF) Distinguished Dissertation Award (2013)
  • Google Ph.D. Fellowship (2011)


Professional Activities

Program Committee

  • SIGMOD: 2022, 2021, 2020, 2019 (Core PC), 2018, 2017, 2016, 2016 (Demo)
  • VLDB: 2022, 2021 (Associate Editor), 2020, 2018, 2018 (Demo), 2017 (Demo)
  • KDD: 2021, 2020, 2019, 2018
  • ICDE/TKDE poster: 2021, 2017, 2016
  • SoCC: 2020
  • SIGIR: 2020
  • ICDCS: 2020
  • WWW: 2017
  • SDM (2018)
  • CIKM (2017)
  • HCOMP: 2019, 2016
  • DASFAA: 2019, 2019 (demo)
  • WAIM: 2018, 2017, 2016, 2015, 2014
  • APWeb: 2017, 2016


  • VLDB 2023 General Co-chair
  • IEEE ICDE 2022 PhD Symposium Chair
  • SIGMOD 2017 Registration Chair
  • WISE 2017 Publicity Co-Chair


  • SFU Big Data Academic Advisory Committee Member (2017 - now)
  • SFU KEY / Vancity Relationship Steering Committee Member (2017 - now)


  Adapted from a template by Liwen Sun.