Towards Causal, Explainable and Universal Medical Visual Diagnosis

CVPR 2019 Workshop, Long Beach, CA

Haytt Bacon Ballroom B

June 16th, 2019


Medical visual diagnosis has been gaining increasing research interest, and widely recognized in both academia and industry as an area of high impact and potential. In addition to classical problems such as medical image segmentation, abnormality detection and personalized diagnosis that benefits from the combination of deep learning approaches and big data, a more challenging goal towards causal, explainable and universal medical visual diagnosis has been urged by the recent availability of large-scale medical data and realistic industrial need. Specifically, medical decisions are usually made by collective analysis of multiple sources such as images, clinical notes, and lab tests, as well as combined human intelligence empowered by medical literature and domain knowledge. Having a single data-driven decision is insufficient for an interactive assistance in clinical setting, a wider explanation on how and why the decision is made (e.g. causality and visual groundings), and a deeper rationality on whether it can be justified by medical domain knowledge and personalized patient disease evolution is desired and necessary.

In particular, modern medical visual diagnosis posts greater challenges including 1) explicit modeling or providing explainable evidence on causality among medical entities and events; 2) generalizing beyond experience with the combinational strengths of “end-to-end” learning and structured universal domain knowledge; 3) dynamic representation and unification across multiple domains such as vision, language, knowledge graphs and decision making. Furthermore, we argue that this human-like intelligence can be more thoroughly explored in the recent surge of multi-modal tasks such as single-image and time-series medical image report generation, medical relational and causality learning, and reinforcement learning for robust and unified diagnostic systems.

The goal of this workshop is thus to allow researchers from the field of machine learning, medical healthcare domain, and other disciplines to exchange ideas, advance an integrative reconciliation between theoretical analysis and industrial landing, and potentially shape the future of this area.

Important Dates

Paper Submission Deadline May 1 2019
Notification to Authors May 17 2019
Camera-Ready Deadline June 1 2019
Workshop Date June 16 2019


Introduction 8:30-8:40am
Le Lu:

"Deep Learning and Big Data Exploration for Preventive and Precision Medicine

in Radiology"

Ruslan Salakhutdinov:

"Incorporating Domain Knowledge into Deep Learning"

Devi Parikh:

"Towards Grounded, Explainable Vision + Language Models"

Coffee Break 10:35-10:40am
Dina Katabi:

"From Wearables to Invisibles!"

Deva ramanan 11:20-12:00am
Conclusion 12:00-12:10pm

Invited Speakers

Ruslan Salakhutdinov is a UPMC professor of Computer Science in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. Ruslan's primary interests lie in deep learning, machine learning, and large-scale optimization. His main research goal is to understand the computational and statistical principles required for discovering structure in large amounts of data. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Connaught New Researcher Award, Google Faculty Award, Nvidia's Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

Devi Parikh is an assistant professor in the School of Interactive Computing at Georgia Tech, and a Research Scientist at Facebook AI Research (FAIR). Her recent work involves exploring problems at the intersection of vision and language, and leveraging human-machine collaboration for building smarter machines. She has also worked on other topics such as ensemble of classifiers, data fusion, inference in probabilistic models, 3D reassembly, barcode segmentation, computational photography, interactive computer vision, contextual reasoning, hierarchical representations of images, and human-debugging. She is a recipient of an NSF CAREER award, an IJCAI Computers and Thought award, a Sloan Research Fellowship, an Office of Naval Research (ONR) Young Investigator Program (YIP) award, to name a few.

Abstract: Machines can often convincingly describe an image in a natural language sentence, answer a free-form question about an image, or hold a conversation with a human about an image. However, careful inspection reveals that these models often rely on superficial language correlations from training data. I will talk about some of our efforts towards making these models ground their predictions in the image content.  Part of what is exciting about problems at the intersection of vision and language is the possibility that they open up for humans to collaborate with machines. Towards the end of my talk, I will briefly mention some recent work I am excited about in using explainable AI to teach humans new concepts, and in creative AI to augment humans' expressive power. 

Dina Katabi is the Andrew & Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT. She is also the director of the MIT’s Center for Wireless Networks and Mobile Computing, a member of the National Academy of Engineering, and a recipient of the MacArthur Genius Award. Professor Katabi received her PhD and MS from MIT in 2003 and 1999, and her Bachelor of Science from Damascus University in 1995. Katabi's research focuses on innovative mobile and wireless technologies with particular application to digital health. Her research has been recognized by the ACM Grace Murray Hopper Award, the SIGCOMM test of Time Award, the Faculty Research Innovation Fellowship, a Sloan Fellowship, the NBX Career Development chair, and the NSF CAREER award. Her students received the ACM Best Doctoral Dissertation Award in Computer Science and Engineering twice.

Abstract: This talk will introduces Emerald, a new technology that monitors health and vital signs without any wearable sensors. The Emerald device is a Wi-Fi like box that transmits low power radio signals, and analyzes their reflections using neural networks. It infers the movements, breathing, heart rate, falls, sleep apnea, and sleep stages, of people in the home -- all without requiring them to wear any sensors or wearables. Since wireless signals traverse walls, the device can monitor multiple rooms in the home. By monitoring a variety of physiological signals continuously and without imposing a burden on users, Emerald can automatically detect degradation in health, enabling early intervention and improving health outcomes.

Le Lu has joined PingAn Technology US Research Labs, after more than five productive years at National Institutes of Health, Clinical Center, Radiology and Imaging Science Department, and recently from NVIDIA AI-Infra division. He pursues how to push modern medical image understanding and semantic parsing to fit into revolutionary clinical workflow practices. Recently, he won the Best Summer Intern Mentor Award 2013, NIH Clinical Center (only one from his institute); and the Postbac Mentor of the Year Award 2015, NIH (one of three awardees in NIH), NIH Clinical Center CEO Award for Research Excellence and Impacts on Patient Care, 2017. He was a Senior Staff Scientist, Image Analytics and Informatics, at Siemens Corporate Research, from Nov. 2011 until Jan. 2013. He was a Staff Scientist at Siemens Computer Aided Diagnosis (CAD) Group, Siemens Medical Solutions at Malvern, Pennsylvania, from Nov. 2009 to Oct. 2011.

Abstract: Recent progresses have been evident on employing deep learning principles upon large quantities (e.g., at hospital scale) of clinical imaging and text databases. However, in modern academic hospitals, there are tremendous amounts of unstructured patient data scattered among different clinical databases (PACS, BTRIS, RIS, CRIS, etc.), which mostly remain non-indexable, non-searchable to a semantic degree and are not useful means yet to tackle the quantitative precision healthcare challenge at scale. In this talk, I will review some of our recent research work in two aspects: 1) a general view on the research studies and insights for three key problems to solve: detection (computer-aided diagnosis/detection), semantic and anatomical segmentation (for precision quantitative imaging), and “big data, weak label” robust deep learning paradigms; 2) organizing and exploiting a large quantity of clinically significant image findings by learning a deep feature representation, and building deep semantic hierarchical (ontology-preserving) lesion similarity embedding over more than 10 thousand patient studies, to permit personalized precision medicine in radiology.

Deva ramanan is an associate professor at the Robotics Institute at Carnegie-Mellon University and the lead of Perception at Argo AI. Prior to joining CMU, he was an associate professor at UC Irvine. His research interests span computer vision and machine learning, with a focus on visual recognition. He was awarded the David Marr Prize in 2009, the PASCAL VOC Lifetime Achievement Prize in 2010, an NSF Career Award in 2010, the UCI Chancellor's Award for Excellence in Undergraduate Research in 2011, the PAMI Young Researcher Award in 2012, one of Popular Science's Brilliant 10 researchers in 2012, and the Longuet-Higgins Prize in 2018 for fundamental contributions in computer vision. His work is supported by NSF, ONR, DARPA, as well as industrial collaborations with Intel, Google, and Microsoft.


Xiaodan Liang
Associate Professor, Sun Yat-sen University.
Christy Y. Li
Ph.D. Candidate, Duke University
Hao Wang
Postdoctoral Research Associate, Massachusetts Institute of Technology
Zhiting Hu
Ph.D. candidate, Carnegie Mellon University
Ricardo Henao
Assistant Professor, Duke University
Lawrence Carin
Professor and Vice Provost for Research, Duke University & Chief Scientist, Infinia ML
Eric Xing
Co-Founder, Petuum & Professor, Carnegie Mellon University

Call for Papers

Call for papers: We invite research work on tasks related to medical visual diagnosis or tasks involving healthcare applications. Paper topics may include but are not limited to:

  • Time series medical analysis (e.g., time-series medical image report generation, medical visual dialogue diagnosis, and personalized patient time-to-event modeling.)
  • Explainable diagnosis (e.g., human interpretable visualization of medical image diagnosis systems, and development of inherently interpretable medical diagnosis models.)
  • Casualty and relational learning (e.g., explicit casualty learning of medical events, and relational learning of diseases, abnormalities, procedures and medication.)
  • Multimodal learning (e.g., a unified medical system that can incorporate data from multiple domains such as images, language, knowledge graphs, and decision making.)
  • Multi-task learning (e.g., a clinical system that is capable of various functionalities, and that can benefit from a joint learning of different but related problems with shared modalities.)

Submission: we encourage submissions in two tracks:

  • Workshop papers: up to 6 pages excluding references and acknowledgements.
  • Extended abstracts: up to 2 pages excluding references and acknowledgements.
Papers will be reviewed single-blind. Accepted papers will be presented as posters in the workshop. Several selected papers will be orally presented in the workshop. The submission should be in the CVPR format and submit via EasyChair Submission Link by May 1 2019. We also encourage submissions of relevant work that has been published in other conferences. If your submission has already been accepted for publication, please indicate in your email the name of the conference.


General information: mvdcvpr2019general[AT]
Paper submission: EasyChair Submission Link