Overview
The course will consist of paper readings, presentations, and student projects. Students write critiques, make presentations, and create an academic paper suitable for a workshop or conference. We will review the recent advances in the area of health data analysis. Reading selections broadly cover the clinical, genetic, image, and wearables data analysis. Students are expected to have a working knowledge of machine learning, data mining, and programming skills to carry an implementation of a final project (preferably in Python, but all languages are welcome). The project is extremely hands-on. You will experience firsthand all of the journeys a data scientist goes through: data ambiguity, missing data, anomalies, skewness, predictive models, descriptive models, etc.
The course consists of:
Each of these components of the course is discussed below, along with several other administrative issues:
Class sessions
- Class sessions combine lectures, discussions of reading, and presentations by students.
- The class is focused around discussion. Please comment, question, and interact.
Paper readings and reviews
- Classes will have one or two assigned readings, which we will all read prior to class and discuss during the class. Reading the papers is essential to get the most out of this course!
- You must submit (on Piazza) a paper review for each of the assigned readings. A one-paragraph review is sufficient. Your reviews should not summarize the paper or repeat the abstract. Instead, your review should comprise at least three comments on the paper. For example, a comment might be:
- a criticism of the paper
- an advantage of the paper that was not discussed in the paper
- a suggestion of a way to extend or build on the paper in future work
- a response to another student’s comment
- Your reviews should contain material that doesn’t appear in the other students’ reviews. (If you independently produce the same idea, that’s fine. Copying other students’ reviews, however, is plagiarism.)
- You are encouraged to read, think about, and comment on the other students’ reviews.
- Submit your review by 11:59 pm the night before the lecture for which the paper was assigned, by posting it on Piazza.
- You may skip any 2 paper reviews over the course of the semester without affecting your grade. Except in extreme cases, the instructor will not grant additional free passes on the reviews.
- Reviews that are submitted on time and meet the guidelines above will be given full credit.
Presentations
- All students must sign up for a presentation slot, please write your name in the sign up sheet. Please choose the slot solely based on the date, not the papers as they are subject to change.
- During the semester, each student will (co-)present a paper and (co-)lead a paper discussion at least once.
- If two presenters are assigned to one presentation, they work together.
- Meet with the instructors at the end of the previous class to discuss your plan. Email your slides to the instructors at least 24 hours before the presentation time. You will receive feedback via email or in person.
- Aim for 20 minutes long, typically, 15 slides per paper. 2-3 slides on motivation and background. 5-7 slides on the core ideas of the paper. 3-5 slides on experimental data. 2-3 slides on your thoughts/criticisms/questions/discussion points about the paper.
- Your presentation should be original and independent. You are allowed to cooperate with your partner for your session.
- Reusing slides from someone else’s presentation should be done only if it is absolutely necessary, but should be acknowledged right at the beginning of the presentation.
- After the presentation, lead a conversation about the papers, covering topics much like those that would be covered in a paper review and the discussion panel. Integrate themes from student commentaries to spark discussion.
Research project
- The research project is one of the highlights of the course. The goal is for students to pick and execute on a data-driven research project in health domain that, by the end of the semester (mid May), would be publishable as a short paper in a top quality workshop like NeurIPS ML4H: Machine Learning for Health, and when expanded to a full paper would be publishable in a top-quality conference.
- We recommend you work in groups of 2-3. Larger or smaller groups should discuss with the instructor first.
- Projects must satisfy three requirements:
- address a real, health/clinical problem in a particular domain
- use a data-driven approach to model the problem
- evaluate the effectiveness of the approach
- Students are encouraged to be creative and should think broadly about problems, domains, and datasets.
- There are many different datasets to work with. More and more health data is collected with various data access policies. You will find these aggregated lists helpful:
- Project milestones: there are three project milestones before the final paper deadline and poster presentation:
- Milestone 1 - Project proposals - due week 5: groups should produce three potential final project proposal. After review, the instructors will provide feedback to help you choose the best project for the course. Submit the proposals to the instructors via email. Your group should send a single email, with all group members cc’d, with simple PDF attachments. Each proposal should be at most a half page of text, clearly describing each of the following:
- the health/clinical problem you plan to address
- what dataset, data-driven approach, and evaluation plan will you use first to attack the problem
- what is the most closely related work, and why your proposed problem is different than those or why your proposed solution is better. You should actively search for related work
- if there are multiple people on your project team, who they are and how you plan to partition the work among the team
- Milestone 2 - Midterm report - due week 8: your group should produce a draft of your final paper’s introduction. The introduction should flesh out the proposal to describe in more detail the motivation, related work, and potential contribution. You must demonstrate progress in your solution and the midterm presentation is worth 15% of your final course grade, so it would be good to start work on the project early. We will be paying close attention to your strategies for using datasets and data-driven approaches to modeling the problem.
- Milestone 3 - Final paper - due last day of exams: for the final milestone, your group should produce a draft of the paper with results. This is a short paper suitable for submission to a workshop. It should clearly state the problem being solved, the importance of the problem, related work, Your approach, evaluation, and results, summary of conclusions, discussion of limitations, and future work The paper should be at most 8 pages for one-person projects, and at most 12 pages for two-person projects. But you will be judged on results, not page count!
- At the end of the course, during the final exam period, we will have a poster session. This will be an opportunity for the instructor to ask questions about your project, and also for other students and faculty in the department to see the cool work that you’ve done.
- Dates for the above steps are announced on the class schedule. In general, you are encouraged to meet with the instructor and seek advice on the project as often as you like.
- Can a project be shared with another course’s project or independent research? It is OK, and often a good idea, to work on a class project that complements your other ongoing projects and has a related topic. However, you should identify the piece of the larger project that you are working on for this course, with separate pieces for other courses. Check with your other instructors as well.
Grading
- Attendance and paper reviews 20%
- Presentations 30%
- Final project 50%:
- Proposal 5%
- Midterm report 15%
- Final paper 30%
Policies on ethics, attribution, and cheating
- The standard university policies on original work, cheating and attribution apply to all work in the course. Violation of these may result in either lowering of course grade by one letter or failing the course, or a different final decision left to the instructor.
User data agreement
- I request access to data collected by the UIUC, course instructors, and their partners for the purpose of scientific investigation, teaching or the planning of clinical research studies and agree to the following terms:
- I will receive access to de-identified data and will not attempt to establish the identity of, or attempt to contact any of the subjects.
- I will not attempt to make direct contact with the project’s PIs or staff at sites concerning the specific results of individual subjects.
- I will not further disclose these data beyond the uses outlined in this agreement and my data use application and understand that redistribution of data in any manner is prohibited.
- I will require anyone on my team who utilizes these data, or anyone with whom I share these data to comply with this data use agreement.
- I will accurately provide the requested information for persons who will use these data and the analyses that are planned using these data.
- I will comply with any rules and regulations imposed by my institution and its institutional review board in requesting these data.
- I will ensure that as an Investigator who utilizes the data, appropriate administrative, physical and technical safeguards to prevent use or disclosure of the data other than as provided for by this Agreement are in place.
- If I publish abstracts using data from CS598HDA course, I agree to the following:
- I will request permission from instructors first.
- I will cite partners as the source of data and the funding sources in the abstract as space allows.
- If I publish manuscripts using data from CS598HDA course, I agree to the following:
- The authorship of the manuscript is determined by data PIs and course instructors.
- I will submit all manuscripts to PIs and course instructors prior to submitting to a journal.
back