Software reliability remains one of the most important and challenging problems in computer science. Meanwhile, software is becoming more parallel in order to scale to today and tomorrow's hardware. However, making reliable concurrent software is notoriously difficult due to the unique challenges posed by concurrency. In this course, we will investigate some of the most practical and important reliability techniques. Topics covered will include: static and dynamic program analyses, concurrency bug detection, testing, debugging, memory consistency models, programming models, deterministic replay, transactional memory, and event-driven systems.
The intent of this course is to help you gain an in-depth understanding of concurrency and the challenges in building reliable concurrent software. This understanding will make you a more effective programmer. You will also learn the state-of-the-art techniques to detect, debug, and fix concurrency bugs, and techniques to improve the reliability of concurrent software.
If you are interested in doing research in the area of software reliability, this course can help you get started; if you are currently involved in research in other areas such as operating system, networking, security, and database, this course can help you apply the techniques learned in this course to your research area.
Workload and Evaluation
This course will center around readings and discussions; it has a final project. The course readings include a list of research papers selected from top programming language, software engineering, and system conferences. We will discuss roughly two to three papers every class meeting. Each student will be expected to present two or three papers in class (the exact number will depend on course enrollment), and to prepare short summaries of the remaining papers. Finally, students will do a project (in pairs or individually). The following grading policy will be used:
Presentations and Discussions
For each paper, one student will be the designated presenter and start off the discussions. Each presentation should be within 20 minutes to cover the following key points of the paper:
- What problem is the research trying to solve? Is it important? Why?
- How does the technique work?
- How is the technique/algorithm evaluated?
- What are the primary contributions of the paper as the author sees it??
- How could this research be extended, or applied in other contexts?
- Questions and Discussion Topics?
Everyone is expected to participate actively in the discussions. Course participants will be able to indicate their preferences for papers that they want to present, and an effort will be made to respect everyone's preferences.
You are expected to read all papers that will be discussed, and write a short summary for each paper you read. The summary can include:
- Problem domain: background, motivation, why this problem is important/novel/interesting.
- Idea of the paper: assumption, main technique, result.
- Comparison with related work.
- Possible future work/direction.
- Pros, cons of the paper (your conclusion).
- What can we do based on this work: lesson learned from this work? stimulate new related problem? flawed assumption/technique can be improved/extended? technique can be used to other domains? ...
The final project is the key of this course. It is essentially a mini research project that may involve building a new system, designing a new algorithm, improving an existing technique, applying an existing technique to a new domain, or performing a large case study. You are encouraged to come up with a topic of your own, which I'll help refine; alternatively, you can choose one of the projects I suggest. You can work on your project alone or with a partner; I will not allow a team of more than two students. The timeline of the project is as follows:
- 10/2 -- Proposal. Choose a topic and submit a 1-2 page proposal (due before class). You will give a 5-10 minutes presentation of your project proposal in class.
- 10/30 -- Midterm project report. Give a 10 minute presentation on the progress of your project.
- 12/1 11:59 pm -- Final report. Submit a 6-page final report, your source code, and all relevant data. Late submission = no grade.
- 12/2 -- Project presentation and demo.
There are no formal prerequisites, but it will help to have some background in programming languages, compilers, software engineering, and/or operating systems in general; and program analysis, parallel/concurrent programming, and/or software reliability in particular.
The enrollment is open to PhD, MS and undergraduate students. If you are an undergraduate and would like to take the course, please email the instructor for permission.
There is no required textbook; all relevant materials will be made available online. See Course Syllabus.
Ethics & Academic Integrity
We will study/discuss threats and attacks in the class/lab. You should be fully aware of ethics when studying these techniques. If in any context you are not sure about where to draw the line, come talk to me first.
"An Aggie does not lie, cheat, or steal or tolerate those who do." For additional information, please visit: http://aggiehonor.tamu.edu.
Upon accepting admission to Texas A&M University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the TAMU community from the requirements or the processes of the Honor System.
Americans with Disabilities Act (ADA) Statement
The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Disability Services, in Cain Hall, Room B118, or call 845-1637. For additional information visit http://disability.tamu.edu.
|9/4||Overview||Concurrency bug study, Coverity, Static and Dynamic analysis||Dr. Huang|
|2||9/9||Program analysis framework||Soot, WALA, JPF, KLEE||Dr. Huang|
|9/11||More frameworks||LLVM, Pin, Valgrind||Dr. Huang|
|9/18||Debugging||Delta debugging, Statistical debugging||Matthew|
|4||9/23||Data Races||What are races, Eraser, FastTrack||Arun|
|9/25||Hybrid race detection, RaceFuzzer||Obaida|
|5||9/30||Atomicity||Velodrome, GKLEE||Jinbin, Glen|
|6||10/7||Lock allocation and Linearizability||Lock allocation, Line-Up||Obaida|
|10/9||Deadlocks||Gadara, CheckMate||Arun, Tian|
|7||10/14||Testing||Maple, Iterative context bounding||Guangliang, Jing|
|8||10/21||Memory Models||Overview, Sequential consistency||Arun, Jinbin|
|10/23||Other bug finding||Memory leak detection, GPU race||Glen, Lei|
|9||10/28||Hack day!||Work on your project|
|10/30||Midterm project report||None|
|10||11/4||Replay Debugging||DPJ, PinPlay||Jing|
|11/6||Deterministic Multithreading||Dthreads, Kendo, PARROT||Guangliang|
|11||11/11||Hack week!||Work on your project|
|12||11/18||Programming Models||Data-centric synchronization, Grace||Brandon, Robert|
|11/20||Transactional Memory||Hybrid TM, Strong Atomicity TM||Robert|
|13||11/25||Event-driven Systems||Web race, Android race||Brandon, Lei|
|14||12/2||Student Project Presentation|