Have you ever felt overwhelmed by the sheer number and variety of different machine learning techniques? Over the last five decades, researchers have created literally thousands of machine learning algorithms, with many new ideas published each year. An engineer wanting to solve a problem using machine learning must choose one or more of these algorithms to try, and their choice is often constrained by those algorithms they happen to be familiar with. In this talk we introduce a new perspective called ‘model-based machine learning’, which recognises the fundamental role played by prior knowledge in all machine learning applications. It provides a compass to guide both newcomers and experts through the labyrinth of machine learning techniques, enabling the formulation and tuning of the appropriate algorithm for each application. We also show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning. Finally, we describe several large-scale commercial applications of this framework.
Chris Bishop is a Microsoft Technical Fellow and the Laboratory Director at Microsoft Research Cambridge. He is also Professor of Computer Science at the University of Edinburgh, and a Fellow of Darwin College, Cambridge. In 2004, he was elected Fellow of the Royal Academy of Engineering, in 2007 he was elected Fellow of the Royal Society of Edinburgh and in 2017 he was elected as a Fellow of the Royal Society.
Chris obtained a BA in Physics from Oxford, and a PhD in Theoretical Physics from the University of Edinburgh, with a thesis on quantum field theory. He then joined Culham Laboratory where he worked on the theory of magnetically confined plasmas as part of the European controlled fusion programme.
From there, he developed an interest in pattern recognition, and became Head of the Applied Neurocomputing Centre at AEA Technology. He was subsequently elected to a Chair in the Department of Computer Science and Applied Mathematics at Aston University, where he led the Neural Computing Research Group. Chris then took a sabbatical during which time he was principal organiser of the six month international research programme on Neural Networks and Machine Learning at the Isaac Newton Institute for Mathematical Sciences in Cambridge, which ran in 1997.
After completion of the Newton Institute programme Chris joined the Microsoft Research Laboratory in Cambridge.
What if I told you I had evidence of a serious threat to American national security – a terrorist attack in which a jumbo jet will be hijacked and crashed every 12 days. Thousands will continue to die unless we act now. This is the question before us today – but the threat doesn’t come from terrorists. The threat comes from climate change and air pollution. We have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across the continental U.S., breaking the country up into 1-square-kilometer zones. We have paired that information with health data contained in Medicare claims records from the last 12 years, and for 97% of the population ages 65 or older. We have developed statistical methods and computational efficient algorithms for the analysis over 460 million health records. Our research shows that short and long term exposure to air pollution is killing thousands of senior citizens each year. This data science platform is telling us that federal limits on the nation’s most widespread air pollutants are not stringent enough. This type of data is the sign of a new era for the role of data science in public health, and also for the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. These and other challenges will be discussed.
NPR
Los Angeles Times
New York Times
Podcast
Di Q, Wang Y, Zanobetti A, Wang Y, Koutrakis P, Dominici F, Schwartz J. (2017). Air Pollution and Mortality in the Medicare Population. New England Journal of Medicine, 376:2513-2522, June 29, 2017, DOI: 10.1056/NEJMoa1702747
Dr. Francesca Dominici is a data scientist whose pioneering scientific contributions have advanced public health research around the globe. Her life’s work has focused broadly on developing and advancing methods for the analysis of large, heterogeneous data sets to identify and understand the health impacts of environmental threats and inform policy. Dr. Dominici received her B.S. in Statistics from University La Sapienza in Rome, Italy and her Ph.D. in Statistics from the University of Padua in Italy. She did her postdoctoral training with Scott L. Zeger and Jonathan M. Samet at the Bloomberg School of Public Health at Johns Hopkins University. In 1999, she was appointed Assistant Professor at the Bloomberg School of Public Health and in 2007 she was promoted to Full Professor with tenure. Dr. Dominici was recruited to the Harvard T.H. Chan School of Public Health as a tenured Professor of Biostatistics in 2009. She was appointed Associate Dean of Information Technology in 2011 and Senior Associate Dean for Research in 2013. She is currently the Co-Director of the Harvard Data Science Initiative.
Governing through technology has proven irresistibly seductive. Technologists, system designers, advocates, and regulators increasingly seek to use the design of technological systems for the advancement of public policy—to protect privacy, advance fairness, or ensure law enforcement access, among others. Designing technology to “bake in” values offers a seductively elegant and potentially effective means of control. Technology can harden fundamental norms into background architecture, and its global reach can circumvent jurisdictional constraints, sometimes out of public view. As technology reaches into the farthest corners of our public and private lives it's power to shape and control human behavior, often imperceptibly, makes it an important locus for public policy. Yet while “Governance-by-design”—the purposeful effort to use technology to embed values—is becoming a central mode of policy-making, our existing regulatory systems are ill-equipped to prevent that phenomenon from subverting public governance. Far from being a panacea, governance-by-design has undermined important governance norms and chipped away at important rights. In administrative agencies, courts, Congress, and international policy bodies, public discussions about embedding values in design arise in a one-off, haphazard way, if at all. Constrained by their structural limitations, these traditional venues rarely explore the full range of values that design might affect, and often advance, a single value or occasionally pit one value against another. They seldom permit a meta-discussion about when and whether it is appropriate to enlist technology in the service of values at all. And policy discussions almost never include designers, engineers, and those that study the impact of socio-technical systems on values. When technology is designed to regulate without such discussion and participation the effects can be insidious. The resulting technology may advance private interests at the price of public interests, protect one right at the expense of another, and often obscures government and corporate aims and the fundamental political decisions that have been made. This talk proposes a detailed framework for saving governance-by-design. It examines recent battles to embed policy in technology design to identify recurring dysfunctions of governance-by-design efforts in existing policy making processes and institutions. It closes by offering a framework to guide "governance-by-design" that surfaces and resolves value disputes in technological design, while preserving rather than subverting public governance and public values.
Deirdre K. Mulligan is an Associate Professor in the School of Information at UC Berkeley, a faculty Director of the Berkeley Center for Law & Technology, and an affiliated faculty on the Hewlett funded Berkeley Center for Long-Term Cybersecurity. Mulligan’s research explores legal and technical means of protecting values such as privacy, freedom of expression, and fairness in emerging technical systems. Her book, Privacy on the Ground: Driving Corporate Behavior in the United States and Europe, a study of privacy practices in large corporations in five countries, conducted with UC Berkeley Law Prof. Kenneth Bamberger was recently published by MIT Press. Mulligan and Bamberger received the 2016 International Association of Privacy Professionals Leadership Award for their research contributions to the field of privacy protection. She is a member of the Defense Advanced Research Projects Agency's Information Science and Technology study group (ISAT); and a member of the National Academy of Science Forum on Cyber Resilience. She is Chair of the Board of Directors of the Center for Democracy and Technology, a leading advocacy organization protecting global online civil liberties and human rights; a founding member of the standing committee for the AI 100 project, a 100-year effort to study and anticipate how the effects of artificial intelligence will ripple through every aspect of how people work, live and play; and a founding member of the Global Network Initiative, a multi-stakeholder initiative to protect and advance freedom of expression and privacy in the ICT sector, and in particular to resist government efforts to use the ICT sector to engage in censorship and surveillance in violation of international human rights standards. She is a Commissioner on the Oakland Privacy Advisory Commission. Mulligan chaired a series of interdisciplinary visioning workshops on Privacy by Design with the Computing Community Consortium to develop a shared interdisciplinary research agenda. Prior to joining the School of Information. she was a Clinical Professor of Law, founding Director of the Samuelson Law, Technology & Public Policy Clinic, and Director of Clinical Programs at the UC Berkeley School of Law.
Mulligan was the Policy lead for the NSF-funded TRUST Science and Technology Center, which brought together researchers at U.C. Berkeley, Carnegie-Mellon University, Cornell University, Stanford University, and Vanderbilt University; and a PI on the multi-institution NSF funded ACCURATE center. In 2007 she was a member of an expert team charged by the California Secretary of State to conduct a top-to-bottom review of the voting systems certified for use in California elections. This review investigated the security, accuracy, reliability and accessibility of electronic voting systems used in California. She was a member of the National Academy of Sciences Committee on Authentication Technology and Its Privacy Implications; the Federal Trade Commission's Federal Advisory Committee on Online Access and Security, and the National Task Force on Privacy, Technology, and Criminal Justice Information. She was a vice-chair of the California Bipartisan Commission on Internet Political Practices and chaired the Computers, Freedom, and Privacy (CFP) Conference in 2004. She co-chaired Microsoft's Trustworthy Computing Academic Advisory Board with Fred B. Schneider, from 2003-2014. Prior to Berkeley, she served as staff counsel at the Center for Democracy & Technology in Washington, D.C.
In the context of building predictive models, predictability is usually considered a blessing. After all - that is the goal: build the model that has the highest predictive performance. The rise of 'big data' has in fact vastly improved our ability to predict human behavior thanks to the introduction of much more informative features. However, in practice things are more differentiated than that. For many applications, the relevant outcome is observed for very different reasons: One customer might churn because of the cost of the service, the other because he is moving out of coverage. In such mixed scenarios, the model will automatically gravitate to the one that is easiest to predict at the expense of the others. This even holds if the predictable scenario is by far less common or relevant. We present a number of applications where this happens: clicks on ads being performed 'intentionally' vs. 'accidentally', consumers visiting store locations vs. their phones pretending to be there, and finally customers filling out online forms vs. bots defrauding the advertising industry. The implications of this are effect are significant: the introduction of highly informative features can have significantly negative impact on the usefulness of predictive modeling and potentially create second order biased in the predictions.
Claudia Perlich started her career in Data Science at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She tends to be domain agnostic having worked on almost anything from Twitter, DNA, server logs, CRM data, web usage, breast cancer, movie ratings and many more. More recently she acted as the Chief Scientist at Dstillery where she designed, developed, analyzed, and optimized machine learning that drives digital advertising to prospective customers of brands. Claudia continues to be an active public speaker and has published over 50 scientific publications as well as a few patents in the area of machine learning. She has won many data mining competitions and awards at Knowledge Discovery and Data Mining (KDD) conferences, and served as the organization's General Chair in 2014. Claudia is the past winner of the Advertising Research Foundation's (ARF) Grand Innovation Award and has been selected for Crain's New York's 40 Under 40 list, Wired Magazine's Smart List, and Fast Company's 100 Most Creative People. She received her PhD in Information Systems from the NYU Stern School of Business where she still teaches as an adjunct professor.
The data science revolution is finally enabling the development of infectious disease models offering predictive tools in the area of health threats and emergencies. Analogous to meteorology, large-scale data-driven models of infectious diseases provide real- or near-real-time forecasts of the size of epidemics, their risk of spreading, and the dangers associated with uncontained disease outbreaks. These models are not only valuable because they predict where and how an epidemic might spread in the next few weeks, but also because they provide rationales and quantitative analysis to support public health decisions and intervention plans. At the same time, the non-incremental advance of the field presents a broad range challenges: algorithmic (multiscale constitutive equations, scalability, parallelization), real time integration of novel digital data stream (social networks, participatory platform for disease monitoring, human mobility etc.). I will review and discuss recent results and challenges in the area, ranging from applied analysis for public health practice to foundational computational and theoretical challenges.
Alessandro Vespignani is the Sternberg Family Distinguished University professor at Northeastern University. He is the founding director of the Network Science Institute and lead the Laboratory for the Modeling of Biological and Socio-technical Systems. Vespignani’s recent work focuses on data-driven computational modeling and forecast of emerging infectious diseases; resilience of complex networks; and collective behavior of techno-social systems. Vespignani is elected fellow of the American Physical Society, member of the Academy of Europe, and fellow of the Institute for Quantitative Social Sciences at Harvard University. He served in the board/leadership of a variety of professional association, journals and the Institute for Scientific Interchange Foundation.
This site uses third parties' cookies.
If you want to know more or deny your consent to all or some of the cookies click here.
If you access any element below this banner you consent to the use of cookies.