Data Processing

Fall 2021

In this course you’ll build your own toolkit of useful programs with which you can read, transform and analyse data that you might find in various scientific areas. You’ll gain experience with professional development by implementing a final project for which you’ll define a problem based on a public dataset, design a visual data interface and implement a data pipeline to bring everything together.


Simon Pauw

Ìris Luden

Daan Moll

Martijn Stegeman


In this course you’ll build your own toolkit of useful programs with which you can read, transform and analyse data that you might find in various scientific areas. After this course we envision that you:

  • you can read data into your programs from several structured standard formats
  • you can transform data into a form suitable for further analysis by combining basic operators
  • you can build meaningful visualizations of your data
  • you understand how to write programs that are easy to understand for yourself and other programmers
  • you are aware of the many tools that can help you with version management, correctness testing and performing code reviews


This course assumes that you finished the courses Scientific Programming 1 and 2.

Other than that, some modules assume high school mathematics or physics, but many do not. If you feel overwhelmed, don’t hesitate to contact the course staff! We can explain the course’s philosophy and requirements, and make recommendations on how to approach problems.

Passing the course

The course consists of two parts. In the first part of the course you will complete a number of programming assignments. In the second part you will work on your own (final) project.

You pass the course by: 1. submitting sufficient coursework 2. finishing the final project

Your grade will be determined by your final project. We will evaluate the following criteria:

  • Process book (20%)
    • It should be documented what challenges you faced during the project and how you solved these challenges
  • Code quality (30%)
    • Your code should be nicely formatted and commented
    • Your readme file should be clear and describe what every file does and how to run your code
  • Final product (50%)
    • Description of the pipeline of the project
    • Visualization itself
    • Description of the visualization

Asking questions

In this course you’ll mostly work on assignments independently. But you’re not on your own! We’re here to help. There are three ways you can get help:

  • Helpdesk (Programmeerbalie): Online or on campus. Book a slot to get help
  • Lab sessions: Only on campus. Work in a classroom together with other students
  • Forum: Only online.

See for more info: Help


Sufficient coursework means submitting a proper solution to each module.

You may not re-submit (variations of) solutions that you wrote for any other course’s problems. In case you have done similar assignments before, discuss with the course staff whether this is the right course for you.


Deadlines for each level are listed below. Only by agreement in advance is it possible to extend these deadlines. Send an e-mail detailing your plans to the course staff at and we will consider your proposal.

Start block 4 (7 Feb 2022)
Finish course in: 8 weeks 16 weeks
Acquisition Mon 14 Feb 2022 Wed 23 Feb 2022
Transformation Mon 21 Feb 2022 Wed 16 Mar 2022
Visualization Mon 28 Feb 2022 Wed 06 Apr 2022
Final Project Fri 25 Mar 2022 Wed 18 May 2022
Final presentation Thu 31 Mar 2022 Fri 3 Jun 2022
Start block 5 (4 Apr 2022)
Finish course in: 8 weeks
Acquisition Wed 13 Apr 2022
Transformation Thu 21 Apr 2022
Visualization Fri 29 Apr 2022
Final Project Wed 25 May 2022
Final presentation Fri 3 Jun 2022


Programming is like writing. You can gradually learn to write programs that are more beautiful, functional, short, elegant or simple. To learn this, you’ll need some feedback, and it’s mostly up to you to get it. You can show your programs in class to fellow students or your teacher; you can post a fragment of your code on Stack Overflow and ask for advice on improving; or you can send the staff an e-mail and we’ll have a look (this might take a while though!).

Doing your own work

This course’s philosophy on academic honesty is best stated as “be reasonable.” The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problem sets is not permitted except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on the course’s test and quiz is not permitted at all.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from the course’s heads. Acts considered not reasonable by the course are handled harshly.


  • Communicating with classmates about problem sets’ problems in English (or some other spoken language).

  • Discussing the course’s material with others in order to understand it better.

  • Helping a classmate identify a bug in his or her code at office hours, elsewhere, or even online, as by viewing, compiling, or running his or her code, even on your own computer.

  • Incorporating a few lines of code that you find online or elsewhere into your own code, provided that those lines are not themselves solutions to assigned problems and that you cite the lines’ origins.

  • Reviewing past semesters’ quizzes and solutions thereto.

  • Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.

  • Sharing a few lines of your own code online so that others might help you identify and fix a bug.

  • Turning to the course’s heads for help or receiving help from the course’s heads during the quiz or test.

  • Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problem set’s problems or your own final project.

  • Whiteboarding solutions to problem sets with others using diagrams or pseudocode but not actual code.

  • Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

Not Reasonable

  • Accessing a solution to some problem prior to (re-)submitting your own.

  • Asking a classmate to see his or her solution to a problem set’s problem before (re-)submitting your own.

  • Decompiling, deobfuscating, or disassembling the staff’s solutions to problem sets.

  • Failing to cite (as with comments) the origins of code or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.

  • Giving or showing to a classmate a solution to a problem set’s problem when it is he or she, and not you, who is struggling to solve it.

  • Looking at another individual’s work during the test or quiz.

  • Paying or offering to pay an individual for work that you may submit as (part of) your own.

  • Providing or making available solutions to problem sets to individuals who might take this course in the future.

  • Searching for or soliciting outright solutions to problem sets online or elsewhere.

  • Splitting a problem set’s workload with another individual and combining your work.

  • Submitting (after possibly modifying) the work of another individual beyond the few lines allowed herein.

  • Submitting the same or similar work to this course that you have submitted or will submit to another.

  • Submitting work to this course that you intend to use outside of the course (e.g., for a job) without prior approval from the course’s heads.

  • Turning to humans (besides the course’s heads) for help or receiving help from humans (besides the course’s heads) during the quiz or test.

  • Viewing another’s solution to a problem set’s problem and basing your own solution on it.

In all cases we follow the directives regarding fraud and plagiarism of the University of Amsterdam and of the Computer Science BSc programme. Find them here in English and Dutch.


This course has been designed by Marleen Rijksen, Wouter Vrielink, Tim Doolan, Martijn Stegeman, and Simon Pauw.

This work is partially based on many great programming resources that have been published as Open Courseware under a Creative Commons license. The resulting work itself is also published under the Creative Commons License Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Feel free to re-use! If you would like to use the work commercially, please send an e-mail for arranging a license.