CSCI6410/CSCI4148/EPAH6410: Applied Research in Health Data Science / Summer 2021-2022
Updates
- Written Proposal Deadline Today
Course Description
This course is an introduction to the application of data science methods to health data within interdisciplinary research contexts. Students will be introduced to the main types of health data and their principal analysis methods while developing key research skills specific to effectively working at the intersection of medicine and computer science. This will encompass developing technical skills in the robust/reproducible analysis of data from medical databases, radiological imaging, electronic medical records, and physiological time-series data. Students will also gain specific training in developing interdisciplinary health data science research proposals including key considerations such as research ethics, data legislation, knowledge translation, and effective collaboration.
Note: this course will be updated for 2023 offering:
- New practical assignments (more closely aligned to the lecture material)
- Additional material/assignment document for the proposal with examples
- Potentially grad student driven paper selection later on in the course
- General cleaning/streamlining of lectures
2022 Course Details
- M/W/F: 1300-1500, 1201 Mona Campbell Building
- Office: 4239 Mona Campbell Building, Studley Campus
- Office Hours: Wednesday 1500-1700 (following synchronous session)
- Email: finlay.maguire@dal.ca
- Syllabus
Course Learning Outcomes
The aim of this course is to provide students with the skills and knowledge required to plan effective research in the application of data science approaches to medical data. Specifically, by the end of the course students will:
Understand the 4 principal sources and data types of medical data: longitudinal databases (tabular), electronic medical records (structured, semi-structured, and unstructured text), radiological imaging (image), and physiological (signal and time-series).
Identify and apply appropriate type of method to the analysis of each data type
Gain the technical skills necessary for effective health data science research including data management, reproducibility, and version control.
Understand the key collaborative, legal, ethical, and knowledge translation concepts required in interdisciplinary health data science research.
Critically appraise research literature in health data science.
Combine these skills to develop high-quality collaborative health data science research proposals
Required Text(s)
- R for Data Science by Wickham & Grolemund (freely available from authors)
- Hands on Machine learning for R by Boehmke & Greenwell (freely available from authors)
- Text Mining with R by Julia Slige & Davin Robinson (freely available from authors)
Course Format
This course is formatted around a mixture of didactic lectures (Mondays), assessed in-class practical exercises (Fridays), and tutorials around primary literature (Wednesday 1st half). These tutorials will involve a rotation of students presenting papers and class-based discussion of the strengths/weaknesses and key methodological take-aways of the presented work. Additionally, the main assessment output for the course will be a collaboratively developed research proposal (based on the MicroResearch model) supported by in-class time and proposal/research-skills related tutorials (Wednesday 2nd half).
Minimal Technical Requirements
This course will require access to an internet-enabled computer capable of installing and running Rstudio (>1.0). Rstudio is freely available and can be install from here
Course Pre-requisites, Co-requisites, Exclusions and/or other Restrictions
Students should either have some previous programming experience (ideally with R) and a knowledge of basic machine learning and/or statistical methods. For graduate students this will be self-certified and relevant additional training material can be found in course readings. For undergraduate students pre-requisites of CSCI2110 and either STAT2060 or CSCI2360 are required.
Basic pre-course familiarisation with R is recommended for all students e.g., completion of the Harvard-Chan School Bioinformatics training module
Course Rationale and/or Other Restrictions and Requirements
This course is designed as an elective course for graduate students in Community Health & Epidemiology and graduate/advanced undergraduate students in the Faculty of Computer Science interested in working at the intersection of medicine and computer science. The primary goals of this course are to (i) provide an overview of the main types of medical data, (ii) introduce key analysis methods for each data, and (iii) build skills necessary for effective interdisciplinary research in this area. This will complement existing non-cross listed/co-located CS and CH&E courses by providing students with an introduction to a wide-range of concepts across those courses or an opportunity to apply those skills within a growing interdisciplinary research context. Within FCS, it will complement existing courses focused on specific analysis methods (e.g., CSCI 6504/6505/6509/6515/6612) and research skills focused courses (e.g., CSCI 6055/6061). Similarly, within CH&E, it will complement technical-skill focused courses (e.g., CH&E 6054/6056) as well as research training (e.g., HINF 6020.03/CH&E 8040) by supporting specific training at their intersection. Given the topic, goals, and cross-listing/co-locating of this course, it will be well placed to form part of the Master’s of Digital Innovation.