Logo: University of Southern California

Events Calendar


  • AI SEMINAR - Incentivizing Exploration [joint work with Peter Frazier, Jon Kleinberg, Robert Kleinberg]

    Fri, Sep 19, 2014 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: David Kempe, USC CS Dept. Associate Professor and Associate Chair for Undergraduate Programs

    Talk Title: Incentivizing Exploration [joint work with Peter Frazier, Jon Kleinberg, Robert Kleinberg]

    Series: AISeminar

    Abstract: We study a Bayesian multi-armed bandit (MAB) setting in which a principal seeks to maximize the sum of expected time-discounted rewards obtained by pulling arms, when the arms are actually pulled by selfish and myopic individuals. Since such individuals pull the arm with highest expected posterior reward (i.e., they always exploit and never explore), the principal must incentivize them to explore by offering suitable payments. Among others, this setting models crowdsourced information discovery and funding agencies incentivizing scientists to perform high-risk, high-reward research.

    We explore the tradeoff between the principal's total expected time-discounted incentive payments, and the total time-discounted rewards realized. Specifically, with a time-discount factor gamma in (0,1), let OPT denote the total expected time-discounted reward achievable by a principal who pulls arms directly without having to incentivize selfish agents, in a MAB problem. We call a (payment, reward) combination (b,rho) in [0,1]^2 achievable if for every MAB instance, using expected time-discounted payments of at most b*OPT, the principal can guarantee an expected time-discounted reward of at least rho*OPT. Our main result is a complete characterization of achievable (payment, reward) pairs: (b,rho) is achievable if and only if sqrt(b) + sqrt(1-rho) >= sqrt(gamma).

    In proving this characterization, we analyze so-called time-expanded policies, which in each step let the agents choose myopically with some probability p, and incentivize them to choose "optimally" with probability 1-p. The analysis of time-expanded policies leads to a question that may be of independent interest: If the same MAB instance (without selfish agents) is considered under two different time-discount rates gamma, eta, how small can the ratio of OPT(eta) to OPT(gamma) be? We give a complete answer to this question, showing that OPT(eta) >= ((1-gamma)^2/(1-eta)^2) OPT(gamma), and that this bound is tight.


    Biography: David Kempe received his Ph.D. from Cornell University in 2003, and has been on the faculty in the computer science department at USC since the Fall of 2004, where he is currently an Associate Professor and Associate Chair for Undergraduate Programs.

    His primary research interests are in computer science theory and the design and analysis of algorithms, with a particular emphasis on social networks, algorithms for feature selection, and game-theoretic and pricing questions. He is a recipient of the NSF CAREER award, the VSoE Junior Research Award, the ONR Young Investigator Award, a Sloan Fellowship, and an Okawa Fellowship, in addition to several USC mentoring awards.

    ***Upon Speakers request there will be no Live Webcast viewing; It will be recorded for internal viewing only***

    Host: Greg Ver Steeg

    Location: Information Science Institute (ISI) - 1135

    Audiences: Everyone Is Invited

    Contact: Alma Nava / Information Sciences Institute

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar