Sr. Software Engineer- Hail Team

Job Description

The Hail team’s mission is to build tools to enable rapid analysis and exploration of massive genetic datasets (10s of TB and tripling yearly). We are dedicated to open science and everything we do is open source. We currently develop in Scala, Spark, Python and C/C++ but will use any tools we need to get the job done.

You have a strong understanding of data structures and algorithms as well as an ability to quickly write clear, correct code to solve non-trivial but well-defined problems. You will contribute to design and implementation of a distributed system that is transforming the way biologists interact with their data. Key to our success is growing a strong and diverse team whose members enable and support the development and success of one another. Self-improvement is a fundamental part of our culture; we want to grow great engineers.

In the vein of building a diverse team, we are committed to giving equal consideration to candidates from underrepresented groups in software engineering. We know that many excellent candidates choose not to apply despite their capabilities and please allow us to enthusiastically counter this. We encourage applications from software engineers with at least two years of experience who are eager to grow personally and to support the growth of junior team-members.

This position will primarily support the Genome Aggregation Database (gnomAD) project. gnomAD is among the most comprehensive catalogues of human genetic variation in the world, consisting of tens of thousands of whole genomes and hundreds of thousands of exomes contributed by over one hundred research groups. Analysis results are shared publicly and have had sweeping impact on biomedical research and the clinical diagnosis of genetic disorders. gnomAD is set to triple in size this year, presenting major engineering and analytic challenges with huge scientific impact.

http://gnomad.broadinstitute.org/

https://www.theatlantic.com/science/archive/2015/09/genome-big-data-disease-genes/404356/

You will:

– Work with the team, scientists, and analysts with diverse expertise to realize transformative scientific goals

– Design, write, test, tune, document, deploy, maintain, and support new features, analysis methods, and infrastructure

– Maintain computing infrastructure and software deployments

– Constructively participate in the design and review of code

Requirements:

– Bachelor’s degree in Computer Science or related field or equivalent experience

– 2+ years of industry software engineering experience

– Solid understanding of computer science fundamentals

– Facility with “tools of the trade”, e.g., Unix system administration, shell scripting, build and deployment tools, version control, etc.

– Ability to meet deadlines and work cooperatively in a multi-disciplinary environment

– Experience with working with cloud deployments

Our website has a ton of material and links to our github repository:

https://hail.is

If you’d like a video intro for engineers on what we’re building and why, check our talk at Spark Summit:

https://spark-summit.org/2017/events/scaling-genetic-data-analysis-with-apache-spark/

If you’re curious how our tool is used, check out our tutorials:

https://hail.is/hail/tutorials-landing.html

EOE / Minorities / Females / Protected Veterans / Disabilities