White papers
A Comprehensive Review of AI Myths and Misconceptions (2023)
Promoting AI literacy allows more people to participate in the discussion about the benefits and costs of AI technology, how and when it should be used, and what we want our future with AI to look like in general. Myths and misconceptions about AI can impede these debates and, at worst, cause bad actions or decisions. To avoid this, this review clarifies common myths and misconceptions about AI through simple explanations and remarks. The review is written for a broad audience because a realistic understanding of artificial intelligence (AI) technology benefits everyone.
SUCCESSFUL COMMUNICATION OF COMPLEX INFORMATION (2023)
Complex information must be conveyed accurately and clearly. This requires both mastery of the subject and effective communication skills. This 12-page guide aims to improve the latter. It is primarily written for technical experts (e.g., data scientists) who need to communicate their findings to diverse stakeholders in business. However, the best practices and strategies in this guide are useful beyond this scenario. Good communication has many universal benefits, including improved relationships. I hope you enjoy!
[Paper]
Selected scholarly publications
Structuring Uncertainty for Fine-Grained Sampling in Stochastic Segmentation Networks
In this work, we have explored Stochastic Segmentation Networks for image segmentation (a deep learning architecture). These networks also predict segmentation uncertainty, which we structured into meaningful non-redundant components, see the bottom row in the image. By reweighting the individual contributions of the components, the overall segmentation can be adjusted in a controlled way. This can be done by a user interface that we provide alongside the code.
Leveraging the Wikipedia Graph for Evaluating Word Embeddings
Deep learning for natural language processing (NLP) often relies on pre-trained word embeddings, that is, vector representations of words. A typical evaluation of these embeddings checks how well they capture word similarities. In this work, we measure similarity by routing the Wikipedia hyperlink graph, which encodes word similarities as edges between articles. Our approach not only avoids the costly human creation of similarity data sets, but also extends to other languages available at Wikipedia.
Robust principal component analysis for generalized multi-view models
Principal component analysis (PCA) is susceptible to data corruption. A more robust approach decomposes a data matrix into (1) a low-rank component for the principal components, e.g., stable background, and (2) a sparse component for the data corruption, e.g., clouds as a moving foreground. This paper shows that for grouped measurements (e.g., multi-channel pixels), the decomposition can be recovered exactly when only the corrupted data matrix is given.