#grid





Overview of Ed-DaSH lessons: workshop materials


Ed-DaSH Lessons

Ed-DaSH uses three sets of lessons, mainly developed following the Carpentries Development Handbook.

Lessons developed by Ed-DaSH Teams

This lesson is a three day introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software enviroments like conda. It allows the adaptation of pipelines written in the most common scripting languages such as Bash, R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson also introduces nf-core: a framework that provides a community-driven, peer reviewed platform for the development of best practice analysis pipelines written in Nextflow.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Full lesson resources are on the Carpentries Incubator

Researchers needing to implement data analysis workflows face a number of common challenges, including the need to organise tasks, make effective use of compute resources, handle any errors in processing, and document and share their methods. The Snakemake workflow system provides effective solutions to these problems. By the end of the course, you will be confident in using Snakemake to run real workflows in your day-to-day research.

Snakemake workflows are described by special scripts that define steps in the workflow as rules, and these are then used by Snakemake to construct and execute a sequence of shell commands to yield the desired output. Re-calculation of existing results is avoided where possible, so you can add or update input data, then efficiently generate an updated result. Workflows can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition.

This course is primarily intended for researchers who need to automate data analysis tasks for biological research involving next-generation sequence data, for example RNA-seq analysis, variant calling, CHIP-Seq, bacterial genome assembly, etc. However, Snakemake has many uses beyond this and the course does not assume any specialist biological knowledge. The language used to write Snakemake workflows is Python-based, but no prior knowledge of Python is required or assumed either. We do require that attendees must have familiarity with using the Linux command line (pipes, redirects, variables, …).

Full lesson resources are on the Carpentries Incubator

This lesson is an introduction to Conda for (data) scientists with an emphasis on Bioinformatics. Conda is an open source package and environment management system that runs on Windows, macOS and Linux. Conda installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. While Conda was created for Python programs it can package and distribute software for any languages such as R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN. This lesson motivates the use of Conda as a development tool for building and sharing project specific software environments that facilitate reproducible bioinformatic workflows.

Full lesson resources are on the Carpentries Incubator

Open Science is disruptive. It will change how we do reasearch and how society benefits from it. Making data re-usable is key to this, and FAIR principles are a way to achieve it.

  • But what does it mean in practice?
  • How can a biologist incorporate those principles in their workflow?
  • We will learn that becoming FAIR and following OS practices is a process.
  • We will learn how to work more efficient with the data

We will teach you how planning and using the correct set of tools you can make your outputs ready for public sharing and reuse.

This hands-on 4 half-day sessions workshop covers the basics of Open Science and FAIR practices, and looks at how to use these ideas in your own projects. The workshop is a mix of lectures and hands-on lessons where you will use the approaches learned and implement some of the discussed practices.

The course is aimed at active researchers in biomedicine science (PhD students, postdocs, technicians, young PIs etc.) who are interested in Open Science, FAIR (Findable, Accessible, Interoperable and Reusable) principles and efficient data management. This training is aimed at those who want to be familiar with these concepts and apply them throughout their project’s life cycle. The course is covered in four half days.

Full lesson resources are on the Carpentries Incubator

This workshop uses a public health dataset and examples (NHANES from the US National Center for Health Statistics) but the materials are relevant to researchers more generally in the life, health and social sciences.

The workshop assumes no prior experience of statistical analysis in R. However, learners are expected to have some familiarity with R such as having done an introductory course. If you do not have any experience currently, one of these Carpentries courses would prepare you:

Full lesson resources are on the Carpentries Incubator:

This course is intended for those who have a working knowledge of statistics and linear models with R and wish to learn high-dimensional statistical methods with R.

This is a short course aimed at familiarising learners with statistical and computational methods for the extremely high-dimensional data commonly found in biomedical and health sciences (e.g., gene expression, DNA methylation, health records). These datasets can be challenging to approach, as they often contain many more features than observations, and it can be difficult to distinguish meaningful patterns from natural underlying variability. To this end, we will introduce and explain a range of methods and approaches to disentangle these patterns from natural variability. After completion of this course, learners will be able to understand, apply, and critically analyse a broad range of statistical methods. In particular, we focus on providing a strong grounding in high-dimensional regression, dimensionality reduction, and clustering.

Full lesson resources are on the Carpentries Incubator

This workshop comprises four lessons on applied machine learning in Python using health data. Lessons take participants through a typical pipeline for prediction, covering key concepts in preparing data, training models, and evaluating performance. We introduce models including decision trees and neural networks and highlight key issues in their responsible use. Prior knowledge of Python (for example, gained through a Carpentries course) is beneficial, but not required.

Full lesson resources are on the Carpentries Incubator

Workshops

See all workshops

Calendar