VBSD 2019

With the ever increasing appetite for data in machine learning, we need to face the reality that for many applications, sufficient data may not be available. Even if raw data is plenty, quality labeled data may be scarce, and if it is not, then relevant labeled data for a particular objective function may not be sufficient. The latter is often the case in tail end of the distribution problems, such as recognizing in autonomous driving that a baby stroller is rolling on the street. The event is rare in training and testing data, but certainly highly critical for the objective function of personal and property damage. Even the performance evaluation of such a situation is challenging. One may stage experiments geared towards particular situations, but this is not a guarantee that the staging conforms to the natural distribution of events, and even if, then there are many tail ends in high dimensional distributions, that are by their nature hard to enumerate manually. Recently the issue has been recognized more widely: DARPA for instance announced the program of Learning with Less Labels, that aims to reduce the number of labels required by a million-fold across a wide set of problems, vision included. In addition, there is mounting evidence of societal effects of data-driven bias in artificial intelligence such as in hiring and policy making with implications for non-government organizations as well as corporate social responsibility and governance. In this second workshop we would like to achieve two goals: (1) Raise awareness by having experts from academia, government and industry share about their perspectives, including on the impact of discriminatory biases of AI, and (2) Share the latest and greatest about biased and scarce data problems and solutions by distinguished speakers from academia and industry.

Program

Date: Sunday, June 16, 2019
Time: 08:30 - 13:00
Location: Hyatt Seaview B

Start Time	Title	Speaker

08:30	Welcome, Introductory Remarks	Jan Ernst (Siemens Research)
08:40	Tackling visual ambiguity: automated detection of hard examples	Animashree Anandkumar (Caltech & NVIDIA)
09:15	Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning	Timnit Gebru (Google)
09:50	Learning More from Less	John R. Smith (IBM T.J. Watson Research Center)
10:20	Coffee Break
10:35	Adapting to shifted data distributions	Kate Saenko (Boston University)
11:10	Forcing Vision + Language Models To Actually See, Not Just Talk	Devi Parikh (Georgia Tech & Facebook AI Research)
11:45	Practical aspects of fairness in recommendations	Chen Karako-Argaman (Shopify)
12:20	Structured knowledge for biased & scarce data	Matt Turek (DARPA) [video]
12:50	Closing Remarks	Jan Ernst