About Me

Adèle Helena Ribeiro

Postdoctoral Researcher
Data Science in Biomedicine
Philipps-Universität Marburg

About me

I am a postdoc in Dominik Heider's research group at Philipps-Universität Marburg and a visiting researcher at Heinrich Heine University Düsseldorf, Germany. Previously, I worked with Elias Bareinboim as a postdoc in the Causal AI Lab at Columbia University, USA. My research lies at the intersection of Computer Science, Statistics, and Artificial Intelligence in Healthcare. My efforts are focused on advancing the theory of causal inference and learning for discovering, generalizing, and personalizing cause-effect relationships from multiple observational and experimental data collections. I am also interested in the development and application of machine learning and AI tools equipped with causal and counterfactual reasoning for more fair, explainable, scalable, reliable, and personalized decision-making. I have a particular interest in applications in the Health Sciences and have directed my research towards addressing challenges that emerge in such domains to help bridge the gap between theory and practical applications.

Research Interests

  • Causal Inference
  • Explainable AI
  • Structure Learning
  • Deep Learning
  • Statistical Genetics
  • Multi-Omics Analysis
  • Computational Neuroscience
  • Health and Medical Research

Resume

Education and Professional Preparation

  • Oct 2022

    Postdoctoral Scholar

    Laboratory of Data Science in Biomedicine
    Philipps-Universität Marburg
    Marburg, Hesse, Germany
    Project: Causal Data Science and Machine Learning in Biomedicine.
    Advisor: Prof. Dominik Heider
  • Sep 2019

    Postdoctoral Scholar

    Causal Artificial Intelligence Laboratory
    Data Science / Computer Science Institutes
    Columbia University
    New York, NY, USA
    Project: Causal Inference in the Health Sciences: from Biased and Heterogeneous Data Collections to Personalized and Improved Patient Outcomes.
    Advisor: Prof. Elias Bareinbom
  • Feb 2019

    Postdoctoral Scholar

    Laboratory of Genetics and Molecular Cardiology
    Heart Institute (InCor)
    University of Sao Paulo
    Sao Paulo, SP, Brazil
    Project: Deep Learning for 12-lead ECG Classification.
    Advisor: Prof. José Eduardo Krieger
  • Nov 2018

    Doctor of Philosophy in
    Computer Science

    Institute of Mathematics and Statistics
    University of Sao Paulo (IME-USP)
    Sao Paulo, SP, Brazil
    PhD's dissertation: Identification of Causality in Genetics and Neuroscience
    Advisor: Prof. André Fujita
    Co-Advisor: Prof. Júlia Maria Pavan Soler
  • Fall 2017

    Doctoral Research Internship

    Neuroscience Institute
    Princeton University
    Princeton, NJ, USA
    Project: Deep learning-based pose representation and dynamics modeling of marmoset monkeys.
    Advisor: Prof. Asif A. Ghazanfar
  • Jun 2014

    Master of Science in
    Computer Science

    Institute of Mathematics and Statistics
    University of Sao Paulo (IME-USP)
    Sao Paulo, SP, Brazil
  • Dec 2011

    Bachelor of Science in Computational
    and Applied Mathematics

    Institute of Mathematics and Statistics
    University of Sao Paulo (IME-USP)
    Sao Paulo, SP, Brazil
    Senior thesis: Analysis of Pyroelectric Infrared (PIR) sensor output signals.
    Advisor: Prof. Roberto Hirata Jr.

Fellowships and Scholarships

Sep 2021 DAAD Postdoc-NeT-AI Fellowship
DAAD Artificial Intelligence Networking (AInet) Fellowship
Federal Ministry of Education and Research, Germany
Sep 2019 - Aug 2022 Postdoctoral Research Fellowship
Causal Artificial Intelligence Lab
Department of Computer Science & Data Science Institute, Columbia University, New York, NY, USA
Jan 2019 - Aug 2019 Postdoctoral Research Fellowship
Coordination for the Improvement of Higher Education Personnel (CAPES), Brazil
Sep 2017 - Dec 2017 PhD Visiting Student at Princeton University
Coordination for the Improvement of Higher Education Personnel (CAPES), Brazil
Aug 2014 - Jul 2018 PhD Graduate Research Scholarship
Coordination for the Improvement of Higher Education Personnel (CAPES), Brazil
Mar 2012 - Feb 2014 MSc Graduate Research Scholarship
Coordination for the Improvement of Higher Education Personnel (CAPES), Brazil

Publications

Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
The electrocardiogram (ECG) serves as a valuable diagnostic tool, providing crucial information about life-threatening cardiac conditions such as atrial fibrillation and myocardial infarction. A prompt and efficient assessment of ECG exams in environments such as Emergency Rooms (ERs) can significantly enhance the chances of survival for high-risk patients. Despite the presence of numerous works on ECG classification, most of these studies have concentrated on one-dimensional ECG signals, which are commonly found in publicly available ECG datasets. Nevertheless, the practical relevance of such methods is limited in hospital settings, where ECG exams are usually stored as images. In this study, we have developed an artificial intelligence-driven screening system specifically designed to analyze 12-lead ECG images. Our proposed method has been trained on an extensive dataset comprising 99,746 12-lead ECG exams collected from the ambulatory section of a tertiary hospital. The primary goal was to precisely classify the exams into three classes: Normal (N), Atrial Fibrillation (AFib), and Other (O). The evaluation of our approach yielded AUROC scores of 93.2%, 99.2%, and 93.1% for N, AFib, and O, respectively. To further validate our approach, we conducted evaluations using the 2018 China Physiological Signal Challenge (CPSC) database. In this evaluation, we achieved AUROC scores of 91.8%, 97.5%, and 70.4% for the classes N, AFib, and O, respectively. Additionally, we assessed our method using 1,074 exams acquired in the ER and obtained AUROC values of 98.3%, 98.0%, and 97.7% for the classes N, AFib, and O, respectively. Furthermore, we developed and deployed a system with a trained model within the ER of a tertiary hospital for research purposes. This system automatically retrieves newly captured ECG chart images from the Picture Archiving and Communication System (PACS) within the ER. These images undergo necessary preprocessing steps and serve as input for our proposed classification method. This comprehensive approach established an efficient and versatile end-to-end framework for ECG classification. The results of our study highlight the potential of leveraging artificial intelligence in the screening of ECG exams, offering a promising solution for the rapid assessment and prioritization of patients in the ER.
Artificial intelligence (AI) and data sharing go hand in hand. In order to develop powerful AI models for medical and health applications, data need to be collected and brought together over multiple centers. However, due to various reasons, including data privacy, not all data can be made publicly available or shared with other parties. Federated and swarm learning can help in these scenarios. However, in the private sector, such as between companies, the incentive is limited, as the resulting AI models would be available for all partners irrespective of their individual contribution, including the amount of data provided by each party. Here, we explore a potential solution to this challenge as a viewpoint, aiming to establish a fairer approach that encourages companies to engage in collaborative data analysis and AI modeling. Within the proposed approach, each individual participant could gain a model commensurate with their respective data contribution, ultimately leading to better diagnostic tools for all participants in a fair manner.
Reasoning about the effect of interventions and counterfactuals is a fundamental task found throughout the data sciences. A collection of principles, algorithms, and tools has been developed for performing such tasks in the last decades (Pearl, 2000). One of the pervasive requirements found throughout this literature is the articulation of assumptions, which commonly appear in the form of causal diagrams. Despite the power of this approach, there are significant settings where the knowledge necessary to specify a causal diagram over all variables is not available, particularly in complex, high-dimensional domains. In this paper, we introduce a new graphical modeling tool called cluster DAGs (for short, CDAGs) that allows for the partial specification of relationships among variables based on limited prior knowledge, alleviating the stringent requirement of specifying a full causal diagram. A C-DAG specifies relationships between clusters of variables, while the relationships between the variables within a cluster are left unspecified, and can be seen as a graphical representation of an equivalence class of causal diagrams that share the relationships among the clusters. We develop the foundations and machinery for valid inferences over C-DAGs about the clusters of variables at each layer of Pearl’s Causal Hierarchy (Pearl and Mackenzie 2018; Bareinboim et al. 2020) - L1 (probabilistic), L2 (interventional), and L3 (counterfactual). In particular, we prove the soundness and completeness of d-separation for probabilistic inference in C-DAGs. Further, we demonstrate the validity of Pearl’s do-calculus rules over C-DAGs and show that the standard ID identification algorithm is sound and complete to systematically compute causal effects from observational data given a C-DAG. Finally, we show that C-DAGs are valid for performing counterfactual inferences about clusters of variables.
Both of the fields of continual learning and causality investigate complementary aspects of human cognition and are fundamental components of artificial intelligence if it is to reason and generalize in complex environments. Despite the burgeoning interest in investigating the intersection of the two fields, it is currently unclear how causal models may describe continuous streams of data and vice versa, how continual learning may exploit learned causal structure. We proposed to bridge this gap through the inaugural AAAI-23 “Continual Causality” bridge program, where our aim was to take the initial steps towards a unified treatment of these fields by providing a space for learning, discussions, and to build a diverse community to connect researchers. The activities ranged from traditional tutorials and software labs, invited vision talks, and contributed talks based on submitted position papers, as well as a panel and breakout discussions. Whereas materials are publicly disseminated as a foundation for the community: https://www.continualcausality.org, respectively discussed ideas, challenges, and prospects beyond the inaugural bridge are summarized in this retrospective paper.
One common task in many data sciences applications is to answer questions about the effect of new interventions, like: what would happen to Y if we make X equal to x while observing covariates Z = z?. Formally, this is known as conditional effect identification, where the goal is to determine whether a post-interventional distribution is computable from the combination of an observational distribution and assumptions about the underlying domain represented by a causal diagram. A plethora of methods was developed for solving this problem, including the celebrated do-calculus [Pearl, 1995]. In practice, these results are not always applicable since they require a fully specified causal diagram as input, which is usually not available. In this paper, we assume as the input of the task a less informative structure known as a partial ancestral graph (PAG), which represents a Markov equivalence class of causal diagrams, learnable from observational data. We make the following contributions under this relaxed setting. First, we introduce a new causal calculus, which subsumes the current state-of-the-art, PAG-calculus. Second, we develop an algorithm for conditional effect identification given a PAG and prove it to be both sound and complete. In words, failure of the algorithm to identify a certain effect implies that this effect is not identifiable by any method. Third, we prove the proposed calculus to be complete for the same task.
Atrial fibrillation (AF) is a common arrhythmia (0.5% worldwide prevalence) associated with an increased risk of various cardiovascular disorders, including stroke. Automated routine AF detection by Electrocardiogram (ECG) is based on the analysis of one-dimensional ECG signals and requires dedicated software for each type of device, limiting its wide use, especially with the rapid incorporation of telemedicine into the healthcare system. Here, we implement a machine learning method for AF classification using the region of interest (ROI) corresponding to the long DII lead automatically extracted from DI-COM 12-lead ECG images. We observed 94.3%, 98.9%, 99.1%, and 92.2% for sensitivity, specificity, AUC, and F1 score, respectively. These results indicate that the proposed methodology performs similar to one-dimensional ECG signals as input, but does not require a dedicated software facilitating the integration into clinical practice, as ECGs are typically stored in PACS as 2D images.
Graphs/networks have become a powerful analytical approach for data modeling. Besides, with the advances in sensor technology, dynamic time-evolving data have become more common. In this context, one point of interest is a better understanding of the information flow within and between networks. Thus, we aim to infer Granger causality (G-causality) between networks' time series. In this case, the straightforward application of the well-established vector autoregressive model is not feasible. Consequently, we require a theoretical framework for modeling time-varying graphs. One possibility would be to consider a mathematical graph model with time-varying parameters (assumed to be random variables) that generates the network. Suppose we identify G-causality between the graph models' parameters. In that case, we could use it to define a G-causality between graphs. Here, we show that even if the model is unknown, the spectral radius is a reasonable estimate of some random graph model parameters. We illustrate our proposal's application to study the relationship between brain hemispheres of controls and children diagnosed with Autism Spectrum Disorder (ASD). We show that the G-causality intensity from the brain's right to the left hemisphere is different between ASD and controls.
Many challenging problems in biomedical research rely on understanding how variables are associated with each other and influenced by genetic and environmental factors. Probabilistic graphical models (PGMs) are widely acknowledged as a very natural and formal language to describe relationships among variables and have been extensively used for studying complex diseases and traits. In this work, we propose methods that leverage observational Gaussian family data for learning a decomposition of undirected and directed acyclic PGMs according to the influence of genetic and environmental factors. Many structure learning algorithms are strongly based on a conditional independence test. For independent measurements of normally distributed variables, conditional independence can be tested through standard tests for zero partial correlation. In family data, the assumption of independent measurements does not hold since related individuals are correlated due to mainly genetic factors. Based on univariate polygenic linear mixed models, we propose tests that account for the familial dependence structure and allow us to assess the significance of the partial correlation due to genetic (between-family) factors and due to other factors, denoted here as environmental (within-family) factors, separately. Then, we extend standard structure learning algorithms, including the IC/PC and the really fast causal inference (RFCI) algorithms, to Gaussian family data. The algorithms learn the most likely PGM and its decomposition into two components, one explained by genetic factors and the other by environmental factors. The proposed methods are evaluated by simulation studies and applied to the Genetic Analysis Workshop 13 simulated dataset, which captures significant features of the Framingham Heart Study.
Faced with the lack of reliability and reproducibility in omics studies, more careful and robust methods are needed to overcome the existing challenges in the multi-omics analysis. In conventional omics data analysis, signal intensity values (denoted by M and values) are estimated neglecting pixel-level uncertainties, which may reflect noise and systematic artifacts. For example, intensity values from two-color microarray data are estimated by taking the mean or median of the pixel intensities within the spot and then subjected to a within-slide normalization by LOWESS. Thus, focusing on estimation and normalization of gene expression profiles, we propose a spot quantification method that takes into account pixel-level variability. Also, to preserve relevant variation that may be removed in LOWESS normalization with poorly chosen parameters, we propose a parameter selection method that is parsimonious and considers intrinsic characteristics of microarray data, such as heteroskedasticity. The usefulness of the proposed methods is illustrated by an application to real intestinal metaplasia data. Compared with the conventional approaches, the analysis is more robust and conservative, identifying fewer but more reliable differentially expressed genes. Also, the variability preservation allowed the identification of new differentially expressed genes. Using the proposed approach, we have identified differentially expressed genes involved in pathways in cancer and confirmed some molecular markers already reported in the literature.
Causal inference may help us to understand the underlying mechanisms and the risk factors of diseases. In Genetics, it is crucial to understand how the connectivity among variables is influenced by genetic and environmental factors. Family data have proven to be useful in elucidating genetic and environmental influences, however, few existing approaches are able of addressing structure learning of probabilistic graphical models (PGMs) and family data analysis jointly. We propose methodologies for learning, from observational Gaussian family data, the most likely PGM and its decomposition into genetic and environmental components. They were evaluated by a simulation study and applied to the Genetic Analysis Workshop 13 simulated data, which mimic the real Framingham Heart Study data, and to the metabolic syndrome phenotypes from the Baependi Heart Study. In neuroscience, one challenge consists in identifying interactions between functional brain networks (FBNs) - graphs. We propose a method to identify Granger causality among FBNs. We show the statistical power of the proposed method by simulations and its usefulness by two applications: the identification of Granger causality between the FBNs of two musicians playing a violin duo, and the identification of a differential connectivity from the right to the left brain hemispheres of autistic subjects.
Blood pressure (BP) is associated with carotid intima-media thickness (CIMT), but few studies have explored the association between BP variability and CIMT. We aimed to investigate this association in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil) baseline. We found a small but significant association between SBP variability and CIMT values. This was additive to the association between SBP central tendency and CIMT values, supporting a role for high short-term SBP variability in atherosclerosis.
A major challenge in biomedical research is to identify causal relationships among genotypes, phenotypes, and clinical outcomes from high-dimensional measurements. Causal networks have been widely used in systems genetics for modeling gene regulatory systems and for identifying causes and risk factors of diseases. In this chapter, we describe fundamental concepts and algorithms for constructing causal networks from observational data. In biological context, causal inferences can be drawn from the natural experimental setting provided by Mendelian randomization, a term that refers to the random assignment of genotypes at meiosis. We show that genetic variants may serve as instrumental variables, improving estimation accuracy of the causal effects. In addition, identifiability issues that commonly arise when learning network structures may be overcome by using prior information on genotype–phenotype relations.
Any measurement, since it is made for a real instrument, has an uncertainty associated with it. In the present work, we address this issue of uncertainty in two-channel cDNA Microarray experiments, a technology that has been widely used in recent years and is still an important tool for gene expression studies. Tens of thousands of gene representatives are printed onto a glass slide and hybridized simultaneously with mRNA from two different cell samples. Different fluorescent dyes are used for labeling both samples. After hybridization, the glass slide is scanned yielding two images. Image processing and analysis programs are used for spot segmentation and pixel statistics computation, for instance, the mean, median and variance of pixel intensities for each spot. The same statistics are computed for the pixel intensities in the background region. Statistical estimators such as the variance gives us an estimate of the accuracy of a measurement. Based on the intensity estimates for each spot, some data transformations are applied in order to eliminate systematic variability so we can obtain the effective gene expression. This paper shows how to analyze gene expression measurements with an estimated error. We presented an estimate of this uncertainty and we studied, in terms of error propagation, the effects of some data transformations. An example of data transformation is the correction of the bias estimated by a robust local regression method, also known as lowess. With the propagated errors obtained, we also showed how to use them for detecting differentially expressed genes between different conditions. Finally, we compared the results with those obtained by classical analysis methods, in which the measurement errors are disregarded. We conclude that modeling the measurements uncertainties can improve the analysis, since the results obtained in a real gene expressions data base were consistent with the literature.

Open-Source Libraries

This package implements the CIDP and IDP algorithms for identifying (conditional) causal effects from a Partial Ancentral Graph (PAG). Technical details are provided in the NeurIPS 2022 paper by Jaber A., Ribeiro A. H., Zhang J., and Bareinboim E., (2022) entitled "Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness".
This package provides methods for learning, from observational Gaussian family data (i.e., Gaussian data clusterized in families), Gaussian undirected and directed acyclic PGMs describing linear relationships among multiple phenotypes and a decomposition of the learned PGM into unconfounded genetic and environmental PGMs. Methods are based on zero partial correlation tests derived in the work by Ribeiro and Soler (2020).
This package provides methods for estimating and normalizing the M (intensity log-ratio) and A (mean log intensity) values from two-channel (or two-color) microarrays. Unlike conventional estimation methods which take into account only measures of location (e.g., mean and median) of the pixel intensities of each channel, the provided estimation method takes into account pixel-level variability, which may reflects uncertainties due noise and systematic artifacts.

Participation in Conferences

Networks are everywhere, from social to biological sciences. Usually these networks are represented by graphs, i.e., mathematical objects composed of a set of vertices and a set of edges. However, a vast number of natural networks are dynamic and current methods typically ignore a third key component: time. This fact requires statistical approaches to analyze them appropriately.

In this context, we propose a methodology to identify Granger causality among graphs. By assuming that graphs are generated by models whose parameters are random variables, we define that a time series of graphs y_{i,t} does not Granger cause another time series of graphs y_{j,t} if the parameters of the model for y_{i,t} does not Granger cause the parameters of the model for y_{j,t}. The problem is that the models that generate the graphs are usually unknown and consequently the parameters cannot be estimated. However, for some random graph models, such as Erdös-Rényi, geometric, regular, Watts-Strogatz, and Barabási-Albert, it is known that the spectral radius (the largest eigenvalue of the adjacency matrix of the graph) is a function of the model parameters. For example, for the Erdos-Renyi random graph model, which is defined by the parameters n, number of vertices, and p, probability of two random vertices are connected, the spectral radius is known to be np.

Based on this idea, we propose to identify Granger causality between time series of graphs by fitting a vector autoregressive model (VAR) to the time series of spectral radii. By an extensive simulation study, we show that the methodology has good accuracy, particularly for large graphs and long time series. In addition, we show that the spectral radius performed better than other centrality measures, such as, degree, eigenvector, betweenness, and closeness centralities. Finally, we applied the methodology to identify Granger causality between brain networks.
To unravel the biological mechanism underlying complex traits and diseases, it is crucial to understand how the related phenotypes are associated with each other and how they are influenced by genetic and environmental factors. Probabilistic graphical models (PGMs) are widely used to describe relationships among variables (phenotypes) in a very intuitive and mathematically rigorous way. On the other hand, family-based studies are usually conducted to assess the influence of genetic and environmental factors on phenotypes. In this case, the polygenic model can be used to decompose the phenotypic variability into two variance components: one polygenic, for capturing the variability across families, and one environmental, for capturing the residual variability. Some algorithms for learning PGMs from observational data, known as structure learning algorithms, are strongly based on a conditional independence test. Considering the case where the observations are independent and pnormally distributed, the null hypothesis of conditional independence can be tested using classical tests for zero partial correlation and PGMs can be learned under Markov-properties equivalence. However, in family-based studies, measurements of related individuals are correlated and such dependence structure must be taken into account to obtain appropriate test statistics.

Based on the Gaussian univariate polygenic model, we derived an estimator for the partial correlation coefficient taking into account the family dependence structure and present a decomposition of the partial correlation coefficient according to the contribution of the genetic and environmental effects. Also, we derived zero partial correlation tests for these coefficients and extended the Meinshausen and Buhlmann (2006)'s approach, which learns undirected PGMs from Vertex Neighborhoods, and the IC (Pearl, 2000) / PC (Spirtes et al., 2000) algorithm, which learns directed PGMs, for learning genetic and environmental PGMs from observational family data. The performance of the proposed methodologies was assessed by using 100 replicates of simulated data, based on the Framingham Heart Study, provided by the Genetic Analysis Workshop (GAW) 13 in problem 2.
A cerveja é parte da história da humanidade e remonta dos legados deixados pelos antigos sumérios, egípcios, mesopotâmios e ibéricos há, pelo menos, 6000 a.C. Apesar disso, longe de ser considerado um processo estável, a produção da cerveja evolui e aprimora-se constantemente, a ponto de, atualmente, motivar uma indústria artesanal em franca expansão que, devido às inúmeras fontes de variabilidade intrínsecas, potencializa o espírito curioso e criativo do alquimista e o refinamento sensorial de indivíduos, independentemente de idade, gênero, condição social, etc.

Identificamos nesse universo uma janela ampla para o despertar do entusiasmo ao aprendizado de alunos do 3o ano da Graduação em Estatística na disciplina de Planejamento de Experimentos (MAE 0317) que abraçaram, imediata e vigorosamente, a proposta de produzirem cerveja como veículo ilustrativo transversal dos conceitos e ferramentas abordados na disciplina. Assim, formalizamos um projeto conjunto, a ser planejado e executado durante o 1º semestre de 2017, em sala de aula e em campo, envolvendo o professor, alunos, a monitoria e especialistas na produção de cerveja.

A ideia é combinar estatisticamente respostas que mensuram a qualidade da cerveja, tais como, densidade, estabilidade da espuma e experiência sensorial (corpo, amargor, doçura, aroma, transparência, etc.) contra fatores de variabilidade que podem ser controlados experimentalmente, tais como, a temperatura de cozimento, a carbonatação e a maturação. Considerando os resultados preliminares obtidos até agora e as perspectivas manifestas, acreditamos que o projeto permite trabalhar a percepção do conteúdo da disciplina pelo aluno, de tal forma a transformar o aprendizado de conceitos teóricos densos em uma experiência prazerosa, estimulante e interativa.
A challenging task in biomedical research is to understand precisely the complex network of causal associations among phenotypes and outcomes. Experimental studies such as clinical trials are the most trustworthy method of causality assessment. However, it may be unfeasible to carry out randomized experiments to discover all possible causal relationships when the number of variables is large. In systems genetics, causal inference is supported by Mendelian randomization, which provides a natural randomization process where genotypes, rather than treatments, are randomly allocated to individuals. Furthermore, genetic variants robustly associated with phenotypes can be seen as instrumental variables, allowing inferences on the causal relation between phenotypes and outcomes.

In this work, we made a comparative study among four recent algorithms that use genetic variants as instrumental variables for learning the structure of a genotype-phenotype network, namely, (i) QTL-directed Dependency Graph (QDG), (ii) QTL-driven phenotype network (QTLnet), (iii) Sparsity-aware Maximum Likelihood (SML), and (iv) QTL+Phenotype Supervised Orientation (QPSO). These algorithms are similar in the sense that they use QTL information to determine the causal direction among phenotypes. However, they were designed under different assumptions and therefore some may be more suitable than others for a particular biological application. By simulation studies, we investigated advantages and limitations of these methodologies, under different configurations. Finally, we applied the algorithms to real data involving cardiovascular phenotypes of F2 rats and compared the inferred causal networks.
Massage therapies are associated with pathological improvements, and have also been extensively used for esthetic purposes. This study aimed to evaluate part of the molecular mechanisms involved in massage by investigating modulation of gene expression associated with cell adhesion and the ECM (extracellular matrix) induced by esthetic massage combined with a cosmetic emulsion. Thirteen female volunteers clinically characterized as having grade II or III cellulite were recruited and were subjected to skin biopsies in the gluteofemoral region before and after treatment. Each volunteer’s leg was considered an experimental unit to reduce individual variability. The study population was divided into: (1) legs treated with a cosmetic emulsion and (2) legs treated with a cosmetic emulsion and massage. Examination of 84 genes analyzed by qPCR revealed a predominance of up-regulation in individuals treated with the emulsion and massage in comparison to individuals treated only with the emulsion (fold change > 1.5, and p < 0.05). The main genes modulated were: ECM proteases (ADAMTS8, MMP1, MMP3, MMP9 and MMP11), transmembrane molecules (HAS1, ITGAL), adhesion molecules (COL8A1 and LAMA1) and cell-matrix adhesion molecules (ADAMTS13). Concluding, the combination (cosmetic emulsion and massage) is therefore recommended to increase the effectiveness of a product and obtain the desired benefits in the treatment of skin disorders such as cellulite. The lack of scientific data on this technique can very often lead to skepticism among health professionals and even patients or consumers of cosmetic treatments. This study helps to elucidate some of the molecular phenomena associated with this therapy.
Most analyses of two-color microarray data are based on point estimation of the log-ratio of the two channel intensities. These estimates, commonly named M values, are conventionally obtained from some location measure of the pixel intensities of each channel, ignoring any imprecision. It is well known that the microarray technology is associated with many noise sources, and it has been shown that improved inferences can be obtained by including the inaccuracies involved and propagating them to downstream analysis. Using the multivariate delta method, we propose new estimators for the mean and the variance of the M values that take into account the probe-level inaccuracies in the analysis.

Invited Talks and Tutorials

Jan 2024 Tropical Probabilistic AI School - Tropical ProbAI 2024
3-hour Tutorial
Hosted with the EMAp FGV Summer School on Data Science 2024, Rio de Janeiro, Brazil
Tutorial on GitHub.
Ribeiro, A. H.. Introduction to Causal Inference.
Aug 2023 Seminar at FGV EMAp
Invited Talk
School of Applied Mathematics of Getulio Vargas Foundation (FGV EMAp), Rio de Janeiro, Brazil.
Ribeiro, A. H. Recent Advances in Causal Inference under Limited Domain Knowledge.
Jul 2023 European Summer School on Artificial Intelligence - ESSAI 2023
5-day Course
Faculty of Computer and Information Science, University of Ljubljana, Slovenia
Ribeiro, A. H., Dhami, D., and Zecevic, M. Machines Climbing Pearl's Ladder of Causation.
Jul 2023 13rd Lisbon Machine Learning School - LxMLS 2023
3-hour Tutorial
Instituto Superior Técnico, Lisbon, Portugal
Ribeiro, A. H.. Causality and its Role in Reasoning, Explainability, and Generalizability.
Jun 2023 Nordic Probabilistic AI School - ProbAI 2023
3-hour Tutorial
Norwegian University of Science and Technology (NTNU), Trondheim, Norway
Tutorial on GitHub.
Ribeiro, A. H.. Causal Inference: Towards Explainable, Generalizable, and Trustworthy AI.
Apr 2023 Workshop on Causal Representation Learning
Invited Talk
Max Planck Institute for Intelligent Systems, Tübingen, Germany
Ribeiro, A. H.. Effect Identification in Cluster Causal Diagrams.
Feb 2023 Continual Causality - Bridge Program at AAAI-2023
90-min Tutorial
Walter E. Washington Convention Center, Washington DC, USA
Ribeiro, A. H.. Putting the Causality in Continual Causality.
Aug 2022 DAAD Postdoc-NeT-AI Tour, Germany
Invited Talks
Institute of Information Systems & Institute for Medical Biometrics and Statistics at the University of Lübeck;
Institute for Computational Systems Biology at the University of Hamburg;
Centre for Cognitive Science at TU Darmstadt;
Center for Systems Biology and Department of Computer Science at TU Dresden; and
Helmholtz Center Munich
Ribeiro, A. H.. Causal Inference from Observational Data in Partially Understood Domains.
Aug 2022 Future Bioinformatics Workshop, Germany
Invited Talk
Ribeiro, A. H.. Causal AI: Towards Explainable, Generalizable, and Trustworthy Decision-Making.
Jul 2022 12th Lisbon Machine Learning Summer School (LxMLS - 2022)
Invited 3-hour Tutorial
Ribeiro, A. H., Bareinboim, E.. Causal Data Science
Jun 2022 Columbia DSI Scholars - Summer Research Bootcamp 2022
Data Science Institute, Columbia University
Invited Talk
Ribeiro, A. H. An Overview on Causal Data Science.
May 2022 Interinstitutional Graduate Program in Statistics (PIPGES)
Federal University of São Carlos and University of São Paulo
Invited Talk
Ribeiro, A. H. Causal Effect Identification in Partially Understood Domains.
Mar 2022 Voices of Data Science at UMass Amherst
Manning College of Information & Computer Sciences, University of Massachusetts Amherst
Invited Talk
Ribeiro, A. H.. On the Importance of Causal Inference in the Next Generation of Artificial Intelligence.
Mar 2022 Causal Inference Learning Group
Biostatistics Department, Mailman School of Public Health, Columbia University
Invited Talk
Ribeiro, A. H..Effect Identification in Cluster Causal Diagrams.
Dec 2021 WHY-21 at NeurIPS 2021 - Causal Inference & Machine Learning: Why now?
Invited Talk
Ribeiro, A. H.. Effect Identification in Cluster Causal Diagrams.
Nov 2021 Laboratory of Epidemiology & Population Science (LEPS) at the National Institute on Aging (NIA)
Invited Talk
Ribeiro, A. H.. Causal Inference and the Data-Fusion Problem
Nov 2021 OECD workshop on AI and the productivity of science.
Invited Talk
Ribeiro, A. H., Bareinboim, E.. Developing causal AI: its importance and an overview.
Sep 2021 Graduate Seminars Series - Statistics
Statistics Department, University of Brasilia - UnB, Brasilia, Brazil
Invited Lecture
Ribeiro, A. H.. Causal Inference and Data-Fusion.
Jul 2021 11th Lisbon Machine Learning Summer School (LxMLS - 2021)
Invited 3-hour Tutorial
Ribeiro, A. H., Bareinboim, E.. Causal Data Science: An Introduction to Causal Inference and Data-Fusion.
Jun 2021 Perspectives in Statistics
Statistics Department, University of Sao Paulo (IME - USP), Sao Paulo, SP, Brazil
Invited Lecture
Ribeiro, A. H.. Causal Inference from Observational Studies
Dec 2020 Seventy-Sixth (76th) Annual Deming Conference on Applied Statistics.
Invited 3-hour Tutorial
Ribeiro, A. H., Adibuzzaman, M., Bareinboim, E.. Causal Inference in the Health Sciences.
Nov 2020 American Medical Informatics Association (AMIA 2020) Virtual Annual Symposium.
Contributed 3.5-hour Tutorial
Ribeiro, A. H., Adibuzzaman, M., Bareinboim, E.. Causal Inference in the Health Sciences.
Oct 2020 Graduate Seminars Series - Biostatistics and Biometrics
Sao Paulo State University - UNESP, Botucatu, SP, Brazil
Invited Lecture
Ribeiro, A. H.. Causal Inference from Observational Studies
May 2019 Graduate Seminars Series - Statistics
Federal University of Sao Carlos and University of Sao Paulo (UFSCar - USP), Sao Carlos, SP, Brazil
Invited Lecture
Ribeiro, A. H.. Learning Genetic and Environmental Probabilistic Graphical Models from Gaussian Family Data.
Jan 2017 Graduate Summer School - Sao Paulo State University - UNESP, Presidente Prudente, SP, Brazil
9-hour Short Course
Ribeiro, A. H., Soler, J.M.P.. Dimensionality Reduction and Structure Learning with Applications to Genomics
May 2016 61a Reunião Anual da Região Brasileira da Sociedade Internacional de Biometria (RBras), Salvador, BA, Brazil
4-hour Short Course
Ribeiro, A. H., Soler, J.M.P.. Dimensionality Reduction Applied to Genomics

Appearances in Popular Media

Oct 2021 “How I would like to continue my research... ”
Interview by Klaus Rathje on the DAAD Postdoctoral Networking Tour "AI in Medicine".
May 2021 Developing and Applying Causal Inference Methods in Public Health
Interview by Karina Alexanyan, Ph.D., for the Data Science Institute at Columbia University.

Teaching

Lecturer

Oct 2023 - Feb 2024 Department of Computer Science, Heinrich Heine University of Düsseldorf, Germany
Causality.
Mar 2023 - Oct 2023 Department of Mathematics and Computer Science, Phillips University of Marburg, Germany
Causal Data Science: Theoretical Foundations and Algorithms.

Assistant Professor

Feb 2018 - Jul 2018 Computer Engineering Department - Institute of Education and Research (Insper), Sao Paulo, SP, Brazil.
Software Design using Python

Teaching Assistant

Mar 2017 - Jul 2017 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Statistical Design of Experiments
Aug 2016 - Dec 2016 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Multivariate Data Analysis
Mar 2016 - Jul 2016 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Statistical Methods for Genetics and Genomics
Aug 2015 - Dec 2015 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Multivariate Data Analysis
Mar 2015 - Jul 2015 Architecture and Urbanism College - University of Sao Paulo (FAU-USP), Sao Paulo, SP, Brazil.
Mathematics, Architecture and Design
Aug 2014 - Dec 2014 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Statistical Techniques, Programming and Simulation
Mar 2014 - Jul 2014 Institute of Astronomy, Geophysics and Atmospheric Sciences - University of Sao Paulo (IAG-USP), Sao Paulo, SP, Brazil.
Numerical Calculus with Applications in Physics
Aug 2013 - Dec 2013 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Mathematical Modeling
Mar 2013 - Jul 2013 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Introduction to Computer Programming
Aug 2012 - Dec 2012 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Linear Programming
Mar 2012 - Jul 2012 Institute of Mathematics and Statistics - University of Sao Paulo (IME-USP), Sao Paulo, SP, Brazil.
Numerical Methods for Linear Algebra