Software engineering for machine learning (research seminar)

Версия на русском

This semester long research course is dedicated to application of machine learning methods and big data technologies to computer science and software engineering

Assoc. Prof. Anton S. Khritankov, ATP MIPT

Website: http://pps-design.org/apmdd:engseminar

Email: anton.khritankov (at) phystech.edu

  • Program synthesis, mining of software repositories and program analysis
  • Software verification and program repair
  • Reproducible research, DevOps, MLOps.
  • Research automation, systematic literature reviews and tools

In Spring 2021 the course is compulsory for masters students at ATP deptartment, but can also be selected as an extra.

Preliminary schedule and important dates for Spring 2021:

  • (Week 1) Course starts
  • (Week 2) Topic selection is due
  • (Week 3) SLR short list review
  • (Week 5) SLR is due
  • (Week 6) RS study / prototype topic approval
  • (Week 8) RS/prototype first review
  • (Week 10) RS/prototype is due
  • (Week 12-13) Course finals

The main activitiy on the course is research practice on the selected topic under supervision of the course staff. This practice includes literature review, implementation of a research prototype or a reproduction of previously published scientific results.

The following activities and goals must be achieved:

  1. Knowingly choose a topic and propose a goal for the research practice
  2. Prepare a short systematic review on the selected topic, or
  3. Study and reproduce key recent results, or
  4. Implement your own preliminary result on the topic in software
  5. Present your results on the research seminar

Systematic literature review in a research field is an established metaresearch method applied in order to accumulate answers on specific research questions from the available literature. Systematic review (SLR), when done well, can be considered as one the reproducible research methods, That is, it results and conclusions can be repeated by other scientists.

The overall procedure for SLR is provided in the table below. The results of the SLR include the following:

  1. the full list of sources considered (long list)
  2. a list of relevant sources (short list)
  3. an SLR report with review statement, golas, research questions, analysis and conclusions
#ActivitiesьOutcomesWork products
1.Define research goals The goal of the reseach along with the topic, a list of specific research questions A section in the SLR report
2.Describe the search protocol A list of paper sources and scientific search engines, search queries A separate section in the SLR report
3.Write down inclusion/exclusion criteria Requirements to papers/sources and its contents for them to be studied in detail (relevant to the RQs, quality of the paper, etc…) A section in the SLR report
4.Search for sources according to the protocol The full list of sources A table with titles, links and abstracts in a separate file
5.Select papers for detailed consideration, return to step 4 if necessary The short list of relevant and useful sources A table with titles, links and abstracts in a separate file
6.Collect information relevant to each research question A short summary of findings for research question with references to the sources A separate subsection for each research question in the SLR report
7.Conclude the research: have all RQs been answered, ways to improve, threats to validity Research analysis and conclusions, recommendations for readers A separate section in the SLR report

References:

  1. Kitchenham, B.A. and S. Charters (2007) Guidelines for performing systematic literature reviews in software engineering, Technical Report EBSE-2007-01, School of Computer Science and Mathematics, Keele University.
  2. Course slides

Reproducing previously published results is an important part of the scientific research method. It help uncover inexactness, lacking specifics and incomplete publications in the influential results in the field. For the reproducer, it help to deeper understand the specifics of the results being reproduced and get hand-on experience in the field.

The overall procedure for reproducibility study is provided in the table below. Work products of the study include the following:

  1. A reporductibility study report
  2. Source code, datasets and configuration used for reproduction

General requirements to papers selected for reproduction:

  1. source code is available for the paper's main result, you are familiar with the programming technilogy employed
  2. needed data is available with permissive license, volume of data and computation is appropriate for reproduction
  3. paper results represent state-of-the-art now or up to 3 years before
#ActivitiesOutcomesWork products
1.Find (or choose from SLR) a paper and get it approved for the study A paper you are comfortable with Use email, put rationale for the choise into the RS report
2.Retrieve code and data, check license Requirements for reproduction are met A VCS repo with code and data (refs to data)
3.Repeat the experiment as stated in the paper Any discrepancies in procedure and/or intermediate results, omissions, inaccuracies A section in the RS report (RS protocol)
4.Compare final results with the paper Your own results from the experiment, discrepancies stated, their sources analysed A section in the RS report
5.Make conclusions on reproducibility of the results and think of lessons learned Reproducibility conclusions, sources of error, threats to validity and limitations. State results that can be reused from the paper A section with conclusions in RS report

References

  1. Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, Eivind Hovig. Ten Simple Rules for Reproducible Computational Research
  2. Course slides
  3. reproducibleresearch.net
  4. gitlab.com/mlrep/mldev