Data Engineer

Updated: 3 months ago
Job Type: FullTime
Deadline: 08 Mar 2021

The Institute

The Centro Nacional de Análisis Genómico (CNAG-CRG) is one of the largest Genome Sequencing Centers in Europe. CNAG-CRG researchers participate in major International Genomic Initiatives such as the International Cancer Genome Consortium (ICGC), the International Human Epigenome Consortium (IHEC), the International Rare Diseases Research Consortium (IRDiRC) and the European Infrastructure for life-science information (ELIXIR), as well as in several EU-funded projects.

It is integrated with the Centre for Genomic Regulation (CRG), an international biomedical research institute of excellence, based in Barcelona, Spain, with more than 400 scientists from 44 countries. The CRG is composed by an interdisciplinary, motivated and creative scientific team which is supported both by a flexible and efficient administration and by high-end and innovative technologies.

In November 2013, the Centre for Genomic Regulation (CRG) received the 'HR Excellence in Research ' logo from the European Commission. This is a recognition of the Institute's commitment to developing an HR Strategy for Researchers, designed to bring the practices and procedures in line with the principles of the European Charter for Researchers and the Code of Conduct for the Recruitment of Researchers (Charter and Code).

Please, check out our Recruitment Policy

The role

The CNAG-CRG is looking for a data engineer to participate in the development tasks of the RD-Connect Genome-Phenome Analysis Platform (GPAP, ) using a mix of languages and tools to process and analyse multi-omics data. Technologies currently in use include Spark/hail, Python, Scala, Hadoop, ElasticSearch, Postgres, DataSHIELD and Opal. The work will be mostly geared towards the infrastructure needs of the Solve-RD project, a large-scale European initiative towards solving rare diseases ( ).

The selected candidate will work on the automation of dataflow from sequencers and user submission to the HDFS/Ceph and Elasticsearch clusters. She/he will build data infrastructure to support the analysis of clinical, genomics and other omics data within personalize medicine framework.


  • Participate in the design, development and maintenance of the RD-Connect GPAP data engineering part
  • Automate workflows
  • Integrate with other databases/platforms (data federation and distributed Machine Learning)
  • Collaborate with data analysts and software engineers developers
  • Interact with national and international partners

About the team

The selected candidate will join the Bioinformatics Analysis Unit led by Dr Sergi Beltran. The multi-disciplinary 20 member Unit is focused on NGS data analysis and tool development, mostly related to human health. The Unit develops the RD-Connect GPAP ( ) and participates, among other, in Solve-RD ( ), EJP-RD ( ), Genomed4All ( ), 3TR ( ), ELIXIR ( ), MatchMaker Exchange ( ), GA4GH ( ) and Clúster de Valorització d'EGA per a la Indústria i la Societat (VEIS).

View or Apply

Similar Positions