Introduction
The UK’s Department for Transport (formerly Ministry Of Transport, or MOT) makes available an anonymised dataset containing the results of every MOT test performed on a road vehicle each year [1]. In addition to the date of the test and the result, the dataset contains the make and model of every vehicle tested, the date on which it was first registered and mileage at the date of the test. In this assessment, you are going to create a desktop Java application capable of browsing the data contained within this dataset and performing some basic statistical analysis.
Acquiring & Understanding the Dataset
The dataset is available here, and appears to be published annually. At the time of writing, the most recent year for which data has been published is 2020, and this is the year’s data upon which you will be working. You will need to download a copy of the data from the site, using the “MOT Testing Data Results (2020)” link. You will be working only with the test results dataset, and not the failure item dataset, which contains data on what exactly caused vehicles to fail their MOT test. Take care to ensure that you download the correct file.
The .zip downloaded from the Department for Transport website contains a number of large Comma
Separated Value (CSV) files containing the MOT test data split by reporting period for the year. A CSV file is a plain text file containing tabular data similar to that in a spreadsheet or database table. Each line in a CSV file contains the data for one particular row of a table (i.e. one test). The first row of each CSV file, as is common, is a header row that gives the names of the columns in the file, to aid in understanding the data structure. The columns contained within the MOT test data CSV files are as follows, in the specified order:
test_id, vehicle_id, test_date, test_class_id, test_type, test_result, test_mileage, postcode_area, make, model, colour, fuel_type, cylinder_capacity, first_use_date
The test_id column contains a unique identifier for each test.
The vehicle_id column contains an identifier that uniquely identifies vehicles under test while avoiding disclosing registration numbers and maintaining anonymity.
The test_date column contains the dates tests were conducted, in ISO 8601 (yyyy-mm-dd) format.
The test_class_id column specifies the class of vehicle tests conducted. For more details see [2].
The test_type column specifies the types of test conducted (e.g. NT for normal test, RT for retest, PL/PV for retests conducted after fixing a minor identified problem)
The test_result column contains the results of tests conducted (e.g. P for pass, F for Fail, PRS for pass with a rectification to be carried out at the testing station)
The test_mileage column contains the mileage of vehicles tested, at the time of the tests.
The postcode_area column contains the alphabetical characters of the first part of the testing centre’s postcode, which serves as an indicator of the town/city/area in which the test was conducted.
The make, model and colour columns are all self-explanatory. Do note, however, that some of these values contain commas, and have thus been surrounded by quotation marks in the files. Some values contain quotation marks, and these have been escaped with a backslash.
The fuel_type column indicates the type of fuel used by vehicles. Notable values include PE (Petrol), DI (Diesel), EL (Electric) and ST (Steam).
The cylinder_capacity column contains the capacity of vehicles’ engine cylinders, commonly measured in cc. Vehicles’ engine capacities are calculated by multiplying by the number of cylinders.
The first_use_date column contains the date on which vehicles were first registered for use in ISO 8601 format (yyyy-mm-dd), allowing their age at date of test to be calculated in combination with the test_date column.
After the first (header) row, each test’s data is contained on a line of the CSV file. The values for each column are separated by commas which makes it fairly easy to parse the data programmatically. An example line from one of the files is depicted below:
1449915597,1108091828,2020-12-31,4,NT,PRS,128135,E,BMW,3 SERIES,SILVER,PE,2996,2007-06-29
To understand the data in the example above, correlate each of the values (separated by commas) with the associated column by consulting the column names and ordering on the previous page, or on the header line of the CSV file.
Your Task
In this assessment, you are going to create a desktop Java application capable of browsing the MOT test data you have downloaded and performing some basic statistical analysis. Because the data is fairly sizeable (approx 3GB), you will import it into an SQLite database to allow the data to be searched efficiently. You will need to submit an exported (.zip) eclipse project containing the following:
- The source code of a JAVA program that imports all of the CSV data you downloaded into an SQLite database using JDBC. The database itself will be too large for submission, so be sure not to include that in your exported work.
- The source code of a Java desktop application that allows users to query the created database using JDBC to view specific test results and identify trends.
- Some screenshots depicting the results of specified queries in your application.
You can find more details on what exactly you need to do for each of these tasks in the following sections.
Data Import Program (30%)
You will need to create a program capable of importing the data contained within the downloaded CSV files into an SQLite database using CSV. The program does not need any kind of user interface, nor does it need to accept any input other than the CSV files themselves. I would recommend that you create the program such that it searches its current working directory for all CSV files, and creates an SQLite database in this same location, deleting any existing database before it begins. Your program must create the SQLite database using a specified table structure, which is depicted in Fig. 1, below.
Figure 1: Entity-Relation Model of Required Database Table Structure
In addition to the tables depicted in Fig. 1, above, you may (and should!) create additional indices to improve the performance of your eventual desktop application, despite the less-than-fully normalised structure. You have discretion over how many indices to create and which columns should be covered.
Your program will need to INSERT all of the data in the downloaded CSV files into the tables created by your program. This may take a considerable length of time, so your program should output a progress update every thousand rows, and at the start/end of each file processed.
Your data import program will be assessed on the following criteria:
• Correctness of the generated database (20%)
• Quality of the code (10%)
The correctness of the generated database will be assessed by comparing the structure of the tables created to that depicted in Fig. 1, and by assessing how robust the program’s handling of erroneous or tricky input is. Rows in the CSV file with invalid input should be skipped, and your program should output a message explaining which row/column’s data was found to be invalid, and why.
The quality of the code will be assessed by reviewing it manually for readability, commenting, security & performance. This is the type of program that gets run only rarely, and should there be a change to the input file format it is highly likely that anyone modifying the program’s code would not have any recent familiarity with it. Thus, good quality code is particularly important here.
You must not use any additional libraries in your program, with the exception of the SQLite JDBC driver [3] as used in class.
Hints & Tips for Database Creation
SQLite’s performance when INSERTing many rows is dramatically increased by nesting all INSERTs inside a single SQL transaction. This is a simple matter of executing a BEGIN TRANSACTION before starting, and a COMMIT when done. Without this, each individual INSERT is wrapped in its own transaction, slowing things down considerably.
I would recommend reading up on SQLite’s support for INSERT OR IGNORE before writing code to populate the Vehicle table.
Be careful to handle fields that are enclosed in quotation marks, or that contain escaped commas, correctly. If you make a mistake here, some rows will have values in the wrong columns.
Desktop Application (70%)
You will create a Java desktop application featuring a Swing user interface that allows users to browse the data contained within the SQLite database created in the previous section, and to perform some basic statistical analysis on said data. Your application will be assessed on the following criteria.
• Implementation of basic search and display functionality (20%)
• Implementation of analysis and reporting functionality (20%)
• Quality of the Swing user interface (20%)
• Quality of the code (10%)
Each area of functionality is discussed in the following subsections.
Basic Search & Display Functionality
The basic search and display functionality should allow the user to search the database for all tests performed on a particular make of vehicle, a particular model of vehicle, vehicles first used in a particular year, vehicles whose mileage a the date of test was within a specified range, or any combination thereof. On performing a search, users should be able to view all matching tests in a JTable Swing component with all the aforementioned columns, plus the test result. Clicking a row in the table should display all of the data held on that particular test in another area of the user interface (e.g. a side panel, or a pop-up window) using appropriate Swing components.
Analysis & Reporting Functionality
The analysis and reporting functionality should allow the user to view graphs depicting the pass rates for vehicles of a specified make and model by age or by mileage. The graphs should be produced using the JFreeChart Java graphing library [4], for which there is a wide variety of documentation and code samples available. You have discretion over the style of the graphs, and the grouping of age and mileage ranges in their production, but I would suggest creating a line chart grouping age by year and grouping mileage by ten thousand mile intervals initially. An example of a JFreeChart line chart that allows the user to see pass rate of one vehicle model by age is depicted in Fig. 2, below.
Figure 2: Example JFreeChart Line Chart of MOT Pass Rate by Age
The chart depicted in Fig. 2, above, is based on a single quarter’s data, so expect to see slight differences if comparing your output to that in Fig. 2. This functionality is intended to be the most challenging part of the assignment, there will be very limited support available on this functionality. I would recommend leaving it until last, I expect many will choose not to implement analysis. At the time of writing, a jar of the most recent version of the JFreeChart library is available here.
Swing User Interface
The quality of the Swing user interface covers the visual appeal of the interface, as well as some quality-of-life features that users could well wish for when using the application. Visual appeal will be assessed on how easy it would be to actually use the application and on how effectively space within the application window is used. Is the information readable? Could a user understand how to use the application without training? Quality-of-life will be assessed on features such as: whether the application resizes gracefully, whether test classes and test results are displayed in plain English instead of by code, and whether JTables are sortable by clicking their column headings.
Code Quality
The quality of your code will be assessed by reviewing the application’s class structure and establishing how well the desire for well factored code with little repetition is balanced with the desire for a simple and easy to understand and maintain class structure. As with the data import program, your code’s readability, commenting, security and performance will also be considered.
Hints & Tips for the Desktop Application
Your user interface is likely to be moderately complex. I would recommend sketching out a design plan before starting to code, depicting the structure of the panel(s) and layout manager(s) used.
Because there will be a fair amount of user interface code, you will almost certainly need to separate the Swing code from the code controlling the application’s business logic by placing it into its own class, and carefully designing how data will be passed to and from the user interface.
The code for controlling the JTable and displaying individual results will be significantly easier to write if your application uses a TestData (or similar) class with properties for all the information held about a particular test.
You don’t need to implement every feature in the assignment specification, you will be given credit for what you do complete, even if some sections are only partially completed.
You can prototype the more complex SQL queries using a tool such as DB Browser [5] before attempting to implement the analysis and reporting features. Your choices in creating indices and designing the SQL queries will impact performance significantly here.
You must not use any additional libraries in the creation of you application, with the exception of the SQLite JDBC driver [3] and the JFreeChart graphing library [4] as mentioned above.
Screenshots
To demonstrate your program working, you will include screenshots of your desktop application performing various tasks in your submission. You do not need to collate the screenshots into a document or report. Instead, you should create a screenshots folder inside your exported eclipse project, and save the screenshots, with descriptive filenames, into this folder. The screenshots will be used to verify the functionality of your program in conjunction with a review of the program’s code, and features missing their associated screenshots will be presumed non-functional. You only need to include screenshots of the functionality you have functioning correctly; partially implemented features will be assessed when reviewing the code.
Basic Search & Display Functionality Screenshots
Your screenshots should depict the list of search results for each of the following criteria. For each, the first test result should be selected with all associated details visible in the program.
• BMW 3 series vehicles
• Renault Megane vehicles
• All Fiat vehicles
• Ford Focus vehicles first used in 2014
• All Volkswagen vehicles with a mileage of between 100k and 120k miles at date of test
• All vehicles first used in 2015 and under 2k miles at date of test • All vehicles with exactly 999,999 miles at date of test • All vehicles under -1 miles at date of test.
• All vehicles first used in 2055
• Any additional features you implemented
Analysis & Reporting Functionality Screenshots
Your screenshots should depict the your program’s output for each of the following reports.
• Pass rate of Peugeot 207 vehicles by age
• Pass rate of Kia Rio vehicles by age
• Pass rate of Ferrari 458 vehicles by mileage
• Pass rate of Mazda MX-5 vehicles by mileage
• Any additional features you implemented User Interface Screenshots
Your screenshots should depict your program after each of the following operations.
• Resizing the window to approx 1.5 times its normal size
• Resizing the window to approx 0.25 times its normal size
• Sorting the results of a search by descending mileage via the table column heading
• Sorting the results of a search by ascending first use date via the table column heading • Any additional features you implemented
Submission and Marking
Exporting Your Work
You will need to export your work from eclipse into a .zip file for marking, add the screenshots to the .zip, and submit the resultant file to Moodle before the deadline. The easiest way to perform the export from eclipse is using the File→Export menu, and selecting General and Archive File in the resultant dialog. Be sure you select (i.e. tick) both the code for both the database creation program and the desktop application itself. DO NOT include either the .CSV data or your generated SQLite database in your submission. Moodle has a maximum file size of 100Mb for uploaded work, and including the data in your .zip will exceed this limit and prevent you from submitting your work.
You can add the screenshots to the exported work using any standard file archiving tool. I recommend 7-Zip [6], if you are not already familiar with an equivalent tool. It is also possible to include the screenshots folder in the eclipse project itself, if you prefer. The resultant .zip file’s structure will be similar using either method.
Marking Criteria
As indicated in the section headings, the database creation program is weighted at 30% of the assignment, and the desktop application at 70%. You will receive a single overall grade for the assignment, marked out of 100. You will be assessed both on how comprehensively you meet the specification, and the quality of the work completed. An indicative marking scheme is depicted in Fig. 3, below.
Figure 3: Marking Scheme
References
- “Anonymised MOT tests and results”, Data.gov.uk, 2022. [Online]. Available:
https://data.gov.uk/dataset/e3939ef8-30c7-4ca8-9c7c-ad9475cc9b2f/anonymised-mot-tests-and-results. [Accessed: 05-Jan-2022] - “MOT testing data user guide (post-May 2018)”, Department for Transport, 2021. [Online] Available: https://data.dft.gov.uk/anonymised-mot-test/test_data/mot-testing-data-user-guide.odt [Accessed: 05Jan-2022]
- “SQLite JDBC Driver”, GitHub, 2022. [Online] Available: https://github.com/xerial/sqlite-jdbc [Accessed: 07-Jan-2022]
- “JFreeChart”, D. Gilbert, 2021. [Online] Available: https://www.jfree.org/jfreechart/ [Accessed: 07-Jan2022]
- “DB Browser for SQLite”, DB4S, 2021. [Online] Available: https://sqlitebrowser.org [Accessed: 07-Jan2022]
- “7-Zip”, I. Pavlov, 2021. [Online] Available: https://www.7-zip.org [Accessed 10-Jan-2022]