Data Modeling

We’ve frequently pointed out that Java classes can help you work with data. By creating your own Java types you can model real-world entities, allowing you to both create new data and work with existing data sets. But we haven’t given you any change to actually do that yet—until now.

Today’s lab gives you a chance to work with data in the way that you are likely to find it in the wild—stored as text in a file. You’ll write code to load the data into a Java class that you define, and write some methods to process it the same way you would if you were performing an actual data analysis or investigation.

If you need help with today’s lab, please join this Zoom meeting. It will be staffed during normal lab hours: 9AM–9PM CST.

1. CSV (60 Minutes Total)

Both today’s lab and the MP introduce you to different ways of representing Java objects as text—one instance of a broader set of techniques known as serialization. The MP introduced you to JSON, a powerful way to convert Java objects 1 to strings while preserving much of the object’s structure.

But in lab today you’ll work with the CSV (comma-separated value) format. While it is far more limited than JSON, it is fairly ubiquitous, and many of the interesting data sets that you can find online you can access in CSV format.

What is the CSV format? Imagine that I have data about pets stored in our usual Pet class:

public class Pet {
  public String name;
  public int age;
  public String type;
  public Pet(setName, setAge, setType) {
    name = setName;
    age = setAge;
    type = setType;
  }
}

Now let’s say that I want to save that data to a file. Maybe I want to make sure it is saved so that the next time my program runs I still have it. Or maybe I need to send it to a friend, or want to work with it using a spreadsheet tool like Google Sheets. Or maybe I want to do some work in another programming language—like Python, or JavaScript, or Go. Regardless: I need some way to save my data about pets so that I can read it back in later.

CSV is one way to do that. A CSV file consists of a series of records, one on each line. Each record contains multiple fields, separated by commas. To save a Pet record I need to convert each of its fields to a String and write them to a file as a single line, with the fields separated by commas.

So, for example, imagine that I had the following three Pets in my program:

Pet chuchu = new Pet("Chuchu", 14, "dog");
Pet xyz = new Pet("Xyz", 4, "cat");
Pet balou = new Pet("Balou", 15, "dog");

Converted to CSV format, those three objects could look like this:

Chuchu,14,dog
Xyz,4,cat
Balou,15,dog

However, they could also look like this:

dog,Chuchu,14
cat,Xyz,4
dog,Balou,15

Same fields, different order—confusing! As a result, we usually add a header specifying the name for each field, to eliminate ambiguity and make sure that we don’t forget how we saved our data:

Name,Age,Type
Chuchu,14,dog
Xyz,4,cat
Balou,15,dog

1.1. Data Modeling Using CSV Data

So now you’ve seen how CSV files are generated using a simple Java class. But how would we do the reverse? Imagine we had a file containing some CSV data and we wanted to load and work with it in Java. How would we do that?

Imagine that we have a CSV of geocache locations as part of a game we’re playing. Each location is worth a certain amount of points. Here’s the CSV header and a few example records:

Latitude,Longitude,Points
40.482979,-88.993390,100
40.197184,-88.366315,-100

To design a Java class to model this data requires making a few decisions.

  • What should we call the class? Usually this requires some information about what it is being used for.

  • What should the names of the instance variables be? Just using the same names as the header is a pretty good convention, although we’ll want to use camel case to avoid variables that start with an uppercase letter.

  • What should the types of the instance variables be? Here we want to examine a bit of the data itself.

Here’s an example class based on the CSV shown above:

public class GeocacheLocation {
  private double latitude;
  private double longitude;
  private int points;
  // Getters and setters not shown
}

When modeling data these decisions are yours to make. But adhering to a few simple conventions will help make your code easier to read and maintain.

1.2. Converting Records to Objects

Next we have to figure out how, given a line of text like:

40.482979,-88.993390,100

We end up with a GeocacheLocation object with latitude == 40.482979, longitude == -88.993390, and points == 100. There are libraries that can do this for you—but it’s not hard to do yourself. Remember String.split and String.trim? Those functions come in handy here 2. And as an additional reminder, Integer.valueOf and Double.valueOf will convert a trimmed String into an int or double, respectively. That’s a good bit of what you need to know to get started!

1.3. Loading Entire Files

Once you can convert a single line containing a record to an object of the appropriate type it’s easy to extend this idea to load an entire file containing multiple records. You end up with an array of objects. So if we loaded:

40.482979,-88.993390,100
40.197184,-88.366315,-100

We’d end up with a GeocacheLocation array of size 2.

1.4. Data Processing

Once we have an array of GeocacheLocation objects we can work with them like any other Java objects. For example, if we knew current location we could use it to determine which GeocacheLocation object was closest to us. Or we could find the one worth the most points. At this point we’re just writing Java code: the CSV parts are behind us!

1.5. Practice With CSVs and Data Processing

Today’s lab homework gives you practice working with data in the same way as described above. You don’t actually get to read the data from the file—but you get to do everything else. This is great practice with object design, and some review of String processing and basic algorithm review as we prepare to begin talking about algorithms and data structures next week.

Good luck, and, as always, have fun! Hopefully this will help demystify the process of working with data in Java.

2. Before You Leave

Don’t leave lab until:

  1. You’ve completed our in-lab testing homework problems.

  2. And so has everyone else in your lab!

If you need more help completing the tasks above please come to office hours, use the new help system, or post on the forum.


Created 7/14/2020
Updated 7/14/2020
Commit 2091ea3 // History // View
Built 7/14/2020 @ 17:43 EDT