CS 8395-03 - Visual Analytics & Machine Learning

Overview Syllabus Schedule Assignments Project Papers Resources

Syllabus

Course Description

This course is a research seminar on topics related to visual analytics and machine learning. Visual analytics is an area of data visualization that is concerned with improving a human’s analytic process, or how one makes sense of data for a given problem: understanding, reasoning, and making decisions about a provided dataset, and a given problem domain. Visual analytics, in particular, is concerned with combining automated processes, with human-driven processes that are built around data visualization - visual representations of data, and ways to interact with data. Given the rapid growth in machine learning the last decade, research in visual analytics has witnessed similar growth in leveraging machine learning in a variety of ways. This course will cover topics that live at the interface of visual analytics and machine learning, exposing you to the basics of visual analytics, how maching learning can be used to enhance visual analytics, and how visual analytics can help machine learning.

Learning Objectives

It is expected that you will learn the following by taking this course:

The latest research in the field of visual analytics, as it pertains to machine learning.
The ability to critique visualization designs, in terms of satisfying good visualization design principles and the user’s analytical needs.
What is involved in designing visualizations for creating visual analytics systems.
Authoring visualizations using web-based technologies.

Instructor

Matthew Berger

email: matthew.berger@vanderbilt.edu

office hours: TR 2:00-3:00, JH 379

Lectures

MW, 2:10-3:25, FGH 258

Content

This course will cover four primary research areas:

Mixed-Initiative Visual Exploration

One of the main goals of data visualization is to enable the human to better understand their data through visual exploration. Through leveraging machine learning techniques, it is possible to improve this form of exploration, through establishing an effective blend of automated analyses provided by a learning technique, and what to expose to the user for determining their interactions.

Visual Analytics for Understanding Models

The growth in machine learning has been accompanied by an equally-pressing demand to understand machine learning models, e.g. to provide provide interpretable and explainable models. Visual analytics plays an important role in helping the user understand machine learning models, be it through understanding the training process of a model, understanding the parameters of a model, understanding features learned by a model from a given set of data, or understanding the outputs produced by a model.

Visual Analytics for Training Models

In machine learning, the training of a model is traditionally accomplished by a human identifying a training dataset, and then training the model, sometimes using a validation set to tune hyperparameters. Opening this process up, however, can enable visual analytics techniques to improve how models are trained, either through improving how humans annotate data used for training, or incorporating the human directly in to the model-building process.

Learning Visualization

Machine learning can also be used as a means to improve the visualization process itself. This can range from methods for recommending visualizations, automating (or semi-automating) the creation of visualizations from a provided dataset, or constructing learning models for visualization techniques.

In addition, the course will cover the basics behind designing data visualizations, ranging from basic visualization principles, to how to code data visualizations for the web using D3.

Course Format

The course will primarily be lecture-based. There is no textbook for the course - all lectures will be based on papers I have listed in the papers section of the website. The schedule section lists papers that will be covered during each lecture. It is expected that, prior to the lecture, you have read the corresponding papers.

Class participation is expected during lectures. As you will quickly see, designing an effective visual analytics solution often boils down to making good decisions. Put simply, there are lots of approaches to visualizing and interacting with data, but most are bad. Discerning good visualization choices from bad ones will be a common theme in the lectures, and should prove invaluable for the visual analytics techniques you develop for the class; thus, I expect everyone to participate in these discussions.

Assignments

In the first half of the semester you will be required to complete three programming assignments. These are intended to satisfy the following:

Exposure to the basics of coding visualizations using web technologies. Specifically, we will be primarily using JavaScript for programming, and in particular D3. Further, we will be using Observable notebooks for our development environment. We will also use a bit of Vega-Lite to slowly ease into things.
How to combine machine learning techniques with data visualization. This involves using JavaScript libraries such as tensorflow.js as well as matrix computation libraries, and how to combine these libraries with interactive data visualization as supported by D3. Indeed, the real power of visual analytics is how to make machine learning interactive.
Understanding basic visualization design principles. This involves understanding spatial organization, color design, interactions, amongst other design decisions. The reactive nature of Observable notebooks will greatly facilitate understanding the design space of authoring visualizations.

Research Paper Presentation

During the middle portion of the semester, you will be expected to present a research paper. You may choose from any of the appropriately denoted papers that are listed in the papers section. If you are interested in presenting a paper that is not listed, or alternatively, listed but not marked, then please contact me for approval.

In your presentation you will be required to address the following questions:

What problem is the paper solving?
What was the approach taken to solve the problem? Discuss the tasks that the paper intends to address, relevant machine learning techniques, and the developed visualization design.
What are the advantages and disadvantages of the approach? Provide a design critique, assessing the trade-offs employed by the paper. Note: the paper, very likely, will not discuss such trade-offs! You will need to think critically about what was involved in their design.
What are some alternative visualization designs? How else could the paper have solved the problem? It is recommended to provide a sketch of alternative designs.

The last point is crucial: the ability to iterate on multiple visualization designs, understanding their strengths and weaknesses, and deciding on a final design, are essential skills in authoring data visualizations.

Project

The latter half of the semester will be devoted to a research project. You will form a team of two, propose a project, develop a working prototype halfway through, and finally present your project to the class at the end of the semester. Please see the project section of the course for more details.

For the project, it is intended that you will use Observable notebooks. However, if you find that a client/server architecture would be more suitable for your needs, then please let me know in advance of your proposal. Nevertheless, D3 is a requirement for the project.

Course Assessment

Assignments: 30%
- Three assignments, each worth 10%
Class Participation: 20%
- Research paper presentation: 10%
- Active in-class discussion: 10%
Project: 50%
- Proposal: 10%
- Prototype: 10%
- Presentation: 10%
- Full Submission: 20%

Prerequisites

You should have a sufficient background in machine learning: basic understanding of unsupervised learning methods (e.g. dimensionality reduction, clustering), supervised learning methods (e.g. classification, regression), basics of optimization, and experience implementing machine learning techniques. You should also have a basic understanding of deep learning methods. Although we will review these methods as appropriate, you should not treat this course as an opportunity for understanding the details of machine learning techniques.

In addition, you should have sufficient background in linear algebra, e.g. the ability to comprehend matrix notation, and an understanding of basic matrix computations, especially matrix inversion, eigendecomposition, and singular value decomposition.

A background in data visualization is not necessary for this course. We will cover the fundamentals behind data visualization, ranging from basic principles, to how to author visualizations using JavaScript and D3, in the first part of the course. Nevertheless, having some background with visualization systems such as matplotlib, ggplot2, Tableau, etc.. will be useful.

Please see the resources page for resources related to JavaScript, SVG, D3, and Observable notebooks.

Discussion

We will use Brightspace for any discussion related to the course: questions on lecture content, project discussion, etc.. Brightspace will also be used for any announcements.

Lecture Slides

See schedule.

Grades

Your final grade will be numeric, and will be converted into a letter grade via the following:

97-100 : A+
94-96 : A
90-93 : A-
87-89 : B+
84-86 : B
80-83 : B-
77-79 : C+
74-76 : C
70-73 : C-
67-69 : D+
64-66 : D
60-63 : D-
< 60 : F

Late Submission Policy

For all deadlines associated with the course, the late submission policy is as follows:

One day late: 10% off
Two days late: 20% off
Past two days: no credit

The exception, here, is class presentation. You will be expected to present to the class three times throughout the semester:

Research paper presentation
Project proposal
Final project

For each of these presentations, no credit will be given if you do not present in your alloted time.

Department and University Academic Policies

Academic Honesty

Studuents should adhere to the Vanderbilt Honor System. Cheating or plagiarizing will not be tolerated in this course.

Do not copy, in any way, another student’s work when it comes to the project.
There are many resources on the web related to visualization and machine learning, and you may want to use some of this code as part of your project. This is fine with me; however, be aware that copying/pasting code from the web may not benefit you intellectually, and might do more harm than good for achieving certain aspects of the project. Having said all of this, if you decide to use pieces of code from external resources, cite it. If you do not, then I will treat this as plagiarism.

Academic Integrity

More generally, students should act in accordance with the academic integrity policies of the university, please see Vanderbilt’s Academic Integrity for more information.

Privacy

All student data and information will be protected under FERPA laws. Please refer to the Vanderbilt Student Privacy Statement. Please take care to not disclose any private information during lectures and when submitting assignments.

Nondiscrimination and Anti-Harassment

Vanderbilt is committed to an environment that is free of discrimination and harassment of any kind. If you feel you are being sexually harassed, please see Project Safe. If you feel unsafe, taken advantage of in any way, or mentally/emotionally unwell, please reach out to the Student Care Network.

Subject to Change Statement

Information contained in the course syllabus, other than the general assessment, may be subject to change with advance notice, at the discretion of the instructor.