+351 220402959


Porto, Portugal

About me

Education and professional summary.


Pedro G. Ferreira graduated in Systems and Informatics Engineering (2002) and completed a PhD in Artificial Intelligence from University of Minho (2007). He was a Postdoctoral Fellow at Center for Genomic Regulation, Barcelona (2008-2012) and at University of Geneva (2012-2014). He participated in several major international consortia including ICGC-CLL, ENCODE, GEUVADIS and GTEx. Currently, he is an Assistant Professor at the Department of Computer Science, Faculty of Sciences of University of Porto and a researcher at INESCTEC-LIADD and i3s/Ipatimup. His main research focus is on genomic data science. In particular, he is interested in unraveling the role of genomics on human health and disease. He has been involved in several bioinformatics start-ups.

  • Name : Pedro G. Ferreira
  • Email :
  • Nationality : Portugal
  • Phone : +351220402959
  • Position : Assistant Professor
  • Affiliation : Department of Computer Science, Faculty of Sciences, University of Porto
  • Work Address : FC6, Rua do Campo Alegre, 4169-007 Porto
  • Researcher : Inesc-Tec and Ipatimup/i3s
Download My CV
My Expericence
Assistant Professor - Department of Computer Science, Faculty of Sciences University of Porto
2019 - present

Participates in the 1st cycle, 2nd cycle (M:CC, M:IERSI and M:BBC) and 3rd cycle MAPi.

Senior Researcher - INESC-TEC
2020 - present

Member of Artificial Intelligence and Decision Support Laboratory.

Associate Researcher - Ipatimup/i3s
2015 - 2019

Position funded by a FCT Investigator Starting Grant (overall success rate 15.1%).

Senior Bioinformatician - CBR Genomics
2014 - 2015

Genomics-as-a-Service company for personalized medicine.

Habilitation in Computer Science - Faculty of Sciences University of Porto
Advanced Computing Training - University of Texas at Austin

Visiting Researcher at Texas Advanced Computing Center & Dell Medical School. Fellowship from UTAustin|Portugal.

Post-doctoral Fellow - University of Geneva, School of Medicine
2012 - 2014

Supervision: E. Dermitzakis.

Post-doctoral Fellow - Centre for Genomic Regulation
2018 - 2012

Supervision: R. Guigó. Supported by a Postdoctoral fellowship from FCT Portugal (2008-2010).

PhD in Artificial Intelligence - University of Minho
2003 - 2007

Thesis: Sequence Pattern Mining in Biochemical Data. Supervision: P. Azevedo. Supported by a PhD fellowship from FCT Portugal.

Bachelor Degree (5 Years) - University of Minho
1997 - 2002

Systems and Informatics Engineering.


Two books that I have co-authored with Miguel Rocha on the topics of data analysis with R and Algorithms for Bioinformatics with Python.

Análise e Exploração de Dados em R

Bioinformatics Algorithms: Design and Implementation in Python


Some selected papers published in the last years and representative of my main research lines.


The landscape of expression and alternative splicing variation across human traits

García-Pérez, R; ....; Ferreira, PG; Ardlie, KG.; Melé, M

BMC Bioinformatics


Understanding the consequences of individual transcriptome variation is fundamental to deciphering human biology and disease. We implement a statistical framework to quantify the contributions of 21 individual traits as drivers of gene expression and alternative splicing variation across 46 human tissues and 781 individuals from the Genotype-Tissue Expression project. We demonstrate that ancestry, sex, age, and BMI make additive and tissue-specific contributions to expression variability, whereas interactions are rare. Variation in splicing is dominated by ancestry and is under genetic control in most tissues, with ribosomal proteins showing a strong enrichment of tissue-shared splicing events. Our analyses reveal a systemic contribution of types 1 and 2 diabetes to tissue transcriptome variation with the strongest signal in the nerve, where histopathology image analysis identifies novel genes related to diabetic neuropathy. Our multi-tissue and multi-trait approach provides an extensive characterization of the main drivers of human transcriptome variation in health and disease.


Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Moreno, M; Vilaça, R; Ferreira, PGC;

BMC Bioinformatics


Background: Gene expression studies are an important tool in biological and bio- medical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifi- cally machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: Inthispaperwereviewthemainstepsandbottlenecksinmachinelearning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures. Keywords: Machine learning, Scalable data science, Gene expression, Transcriptomics, Data analysis


Deep learning for drug response prediction in cancer

Baptista, D; Ferreira, PG; Rocha, M;

Briefings in Bioinformatics


Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling drug response based on both biological and chemical data, but the application of DL to drug response prediction has been unexplored until very recently. The few studies that have been published have shown promising results, and the use of DL for drug response prediction is beginning to attract greater interest from researchers in the field. In this article, we critically review recently published studies that have employed DL methods to predict drug response in cancer cell lines. We also provide a brief description of DL and the main types of architectures that have been used in these studies. Additionally, we present a selection of publicly available drug screening data resources that can be used to develop drug response prediction models. Finally, we also address the limitations of these approaches and provide a discussion on possible paths for further improvement.


Gender Differential Transcriptome in Gastric and Thyroid Cancers

A Sousa, M Ferreira, C OliveiraC, PG FerreiraC

Frontiers in Genetics 11, 808


Cancer has an important and considerable gender differential susceptibility confirmed by several epidemiological studies. Gastric (GC) and thyroid cancer (TC) are examples of malignancies with a higher incidence in males and females, respectively. Beyond environmental predisposing factors, it is expected that gender-specific gene deregulation contributes to this differential incidence. We performed a detailed characterization of the transcriptomic differences between genders in normal and tumor tissues from stomach and thyroid using Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) data. We found hundreds of sex-biased genes (SBGs). Most of the SBGs shared by normal and tumor belong to sexual chromosomes, while the normal and tumor-specific tend to be found in the autosomes. Expression of several cancer-associated genes is also found to differ between sexes in both types of tissue. Thousands of differentially expressed genes (DEGs) between paired tumor-normal tissues were identified in GC and TC. For both cancers, in the most susceptible gender, the DEGs were mostly under-expressed in the tumor tissue, with an enrichment for tumor-suppressor genes (TSGs). Moreover, we found gene networks preferentially associated to males in GC and to females in TC and correlated with cancer histological subtypes. Our results shed light on the molecular differences and commonalities between genders and provide novel insights in the differential risk underlying these cancers.


The effects of death and post-mortem cold ischemia on human tissue transcriptomes

PG FerreiraC, M Muñoz-Aguirre, F Reverter, CPS Godinho, A Sousa, ...

Nature communications 9 (1), 1-15


Post-mortem tissues samples are a key resource for investigating patterns of gene expression. However, the processes triggered by death and the post-mortem interval (PMI) can significantly alter physiologically normal RNA levels. We investigate the impact of PMI on gene expression using data from multiple tissues of post-mortem donors obtained from the GTEx project. We find that many genes change expression over relatively short PMIs in a tissue-specific manner, but this potentially confounding effect in a biological analysis can be minimized by taking into account appropriate covariates. By comparing ante-and postmortem blood samples, we identify the cascade of transcriptional events triggered by death of the organism. These events do not appear to simply reflect stochastic variation resulting from mRNA degradation, but active and ongoing regulation of transcription. Finally, we develop a model to predict the time since death from the analysis of the transcriptome of a few readily accessible tissues.


Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

PG Ferreira, M Oti, M Barann, T Wieland, S Ezquina, MR Friedländer, ...

Scientific Reports 6, 32406


Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.


The human transcriptome across tissues and individuals

M Melé*, PG Ferreira*, F Reverter*, DS DeLuca, J Monlong, M Sammeth, ...

* equal contribution

Science 348 (6235), 660-665


Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of genes—which is most clearly seen in blood—though few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes.


The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans

GTEx Consortium (including PG Ferreira)

Science 348 (6235), 648-660


Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.


The GTEx Consortium atlas of genetic regulatory effects across human tissues

GTEx Consortium (including PG Ferreira)

Science 369 (6509), 1318-1330


The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.


In the last years I have been involved in teaching in the following courses either as chair or as a teacher for practical classes.

Foundations and Applications of Machine Learning (Doctoral Program MAPi)

Algorithms for Bioinformatics / Bioinformatics

Bioinformatics for Master in Medical Informatics

Artificial Intelligence

Imperative Programming (C Language)

Introduction to Programming (Python Language)


Supervision and co-supervision of PhD and Master Candidates.

My Blog

The Last posts about Data Analysis and Machine Learning.

1 June, 2021 #ML
Check out the conference: Cutaneous Melanoma: from where to where?

I am honoured to be part of the scientific committee of the anual conference of the Portuguese League Against Cancer.

Read More
7 June, 2021 #ML
New Degree in Artificial Intelligence and Data Science at the University of Porto

Checkout the new Degree in Artificial Intelligence and Data Science at the University of Porto that will open this year with 55 places!

Read More
1 June, 2021 #ML
Dimensionality Reduction: PCA, MDS, t-SNE and UMAP

Dimensionality reduction with Matrix factorization and Neighbour graph based methods. An application case with R.

Read More


Some of the many projects that I have been collaborating in the last 10 years as a collaborating researcher or as a principal investigator.


Stroke in translation: Biomarkers for diagnosis and management of acute ischemic stroke





Solving the 3D chromatin structure of CDH1 locus to identify disease-associated mechanisms





Neoantigen signature algorithm to predict immunotherapy response





Tregs in cancer immune response





Life-time Risk Estimations and Genetic Modifiers Of Hereditary Diffuse Gastric Cancer




Gastric Cancer

Understanding the impact of acquired and germline genetic variants in the complexity of gastric cancer



Role: PI


Expression (GTEx)

he Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq.


Role: PI (Non-funded member of AWG)


The Geuvadis project aims to bring together the knowledge and resources on medical genome sequencing at a European level and allow researchers to develop and test new hypotheses on the genetic basis of disease


Role: Investigator

Contact Me

Feel Free To Contact Regarding Academic Issues.


+315 220402700



Rua do Campo Alegre s/n, 4169-007 Porto