Python Data Classes for the Masses!

Before we jump into data classes, let's just understand the concept of a normal class in Object-Oriented-Programming (OOP).  A class contains two distinct responsibilities - methods and attributes. Developers use classes in many different ways, with some being functionality-focused, while others are more data-focused.

Data classes provide a set of enhancements on top of regular classes, and automatically add an initializer, updated print specifications, updated sorting mechanism and more.

Let's get to our example already!

Let's use the following example as our class - videogame.

The videogame class contains the following primary attributes:

  • title (STRING): Videogame's title
  • developer (STRING): Videogame's developer
  • release (INTEGER): Year the videogame was released

We will create three instances of that class, to hold our examples:

  • Super Mario World / Nintendo / 1990
  • Metal Gear Solid / Sony / 1998
  • Halo / Microsoft / 2001

Armed with that knowledge, this is how we have defined that class and the three instances of our videogames:

class Videogame:
    def __init__(self, title, developer, release):
        self.title = title
        self.developer = developer
        self.release = release




smw = Videogame("Super Mario World", "Nintendo", 1990)
mgs = Videogame("Metal Gear Solid", "Sony", 1998)
halo = Videogame("Halo", "Microsoft", 2001)
halo_again = Videogame("Halo", "Microsoft", 2001)

# Let's print a few things and see how this looks
print(smw)
print(mgs)
print(halo)
print(halo_again)

print(f"Is halo the same as halo_again? {halo == halo_again}")

Now, when we execute this, the following shows up on our terminal:

python ./videogame.py
<__main__.videogame object at 0x7f61057bad60>
<__main__.videogame object at 0x7f61057badf0>
<__main__.videogame object at 0x7f61057c6280>
<__main__.videogame object at 0x7f61057c62b0>
Is halo the same as halo_again? False

Notice that the print statement for the object does not help us at all in knowing what is inside of the class.  It just shows that it is a class at a specific memory address.

Bring on the Data Classes!

Important - Python 3.7+ is REQUIRED in order to use data classes.

Python released data classes on version 3.7, as documented on PEP-557, linked to below:

PEP 557 -- Data Classes
The official home of the Python Programming Language

Using a data class is as simple as importing the module, adding a decorator to our existing class and making one small modification to include the data types for the attributes that will be included in the data class.  Since the dataclass brings in a built-in initializer, we don't need to include the __init__ initializer on the class any longer.

Here is what that would look like on our videogame class:

from dataclasses import dataclass



@dataclass
class Videogame:
    title: str
    developer: str
    release: int




smw = Videogame("Super Mario World", "Nintendo", 1990)
mgs = Videogame("Metal Gear Solid", "Sony", 1998)
halo = Videogame("Halo", "Microsoft", 2001)
halo_again = Videogame("Halo", "Microsoft", 2001)

# Let's print a few things and see how this looks
print(smw)
print(mgs)
print(halo)
print(halo_again)

print(f"Is halo the same as halo_again? {halo == halo_again}")

Now, when we execute it, our results are a bit different, and so is the comparison behavior.  See below:

python3 videogameDC.py
Videogame(title='Super Mario World', developer='Nintendo', release=1990)
Videogame(title='Metal Gear Solid', developer='Sony', release=1998)
Videogame(title='Halo', developer='Microsoft', release=2001)
Videogame(title='Halo', developer='Microsoft', release=2001)
Is halo the same as halo_again? True

As you can see, now the printout shows the actual object, with all of its properties, and when comparing halo with halo_again, we see that they are the same (even though they are separate objects, in separate memory addresses), as they hold the same exact data inside of them.

Let's add a bit of fun to the mix

Now that we have decorated our class with @dataclass, we can add a few more optional goodies to help us when using it.  Let's start by looking a adding an order to our dataclass to help with comparison:

@dataclass(order=True)
class Videogame:
    title: str
    developer: str
    release: int

That simple parameter on the decorator will allow us to compare two instances of the class and determine which one is greater than the other; however, before we compare our objects, we will need to do a little bit more work.  In our case, the comparison is going to be done based on the release year of the videogame.  This means that we will need to add the following tidbits to our code:

  1. Specify a sort_index so that our data class knows how to sort
  2. Specify the field to use for sorting, so that our data class knows what to use for sorting
  3. Specify that the sort_index is only to be used for sorting, and not for instantiating/printing the class (must import the field from the dataclasses module)

Combined, these code changes will make our class look like this:

from dataclasses import dataclass, field



@dataclass(order=True)
class Videogame:
    # add sort index, so that we can tell our class HOW to sort (step 1)
    # and tell the dataclass not to use in instantiation nor print (step 3)
    sort_index: int = field(init=False, repr=False)
    title: str
    developer: str
    release: int

    # use the post_init method to tell the class WHAT to use for sorting (step 2)
    def __post_init__(self):
        self.sort_index = self.release

And now, we can ask our code some specific questions related to their release year, in a very simple way:

print(f"Did Halo come out after Super Mario World? {halo > smw}")
print(f"Did Metal Gear Solid come out after Halo? {mgs > halo}")

and when we run our code, we get the following:

python3 videogameDC.py
Videogame(title='Super Mario World', developer='Nintendo', release=1990)
Videogame(title='Metal Gear Solid', developer='Sony', release=1998)
Videogame(title='Halo', developer='Microsoft', release=2001)
Videogame(title='Halo', developer='Microsoft', release=2001)
Is halo the same as halo_again? True
Did Halo come out after Super Mario World? True
Did Metal Gear Solid come out after Halo? False

Now, as our data grows, and we get more attributes for our videogame class, we can add them to the data class, and assign default values, which will help us maintain the existing code working as expected.  For example, let's say that we want to add a protagonist attribute to our class.

@dataclass(order=True)
class Videogame:
    sort_index: int = field(init=False, repr=False)
    title: str
    developer: str
    release: int
    protagonist: str

    def __post_init__(self):
        self.sort_index = self.release




smw = Videogame("Super Mario World", "Nintendo", 1990)
mgs = Videogame("Metal Gear Solid", "Sony", 1998)
halo = Videogame("Halo", "Microsoft", 2001)

If we just run the script above, python will throw an error because we are not yet instantiating the class with the protagonist in our parameters:

python3 videogameDC.py
Traceback (most recent call last):
  File "videogameDC.py", line 19, in 
    smw = Videogame("Super Mario World", "Nintendo", 1990)
TypeError: __init__() missing 1 required positional argument: 'protagonist'

We have two ways to solve this challenge:

  1. Change all of the places where the class is being instantiated (which, on a large project, can be a daunting task)
    OR
  2. Assign a default value to our new attribute and then progressively change all of the places that will need this data.

Let's dig a little deeper on option number 2 above:

@dataclass(order=True)
class Videogame:
    sort_index: int = field(init=False, repr=False)
    title: str
    developer: str
    release: int
    protagonist: str = "N/A"

    def __post_init__(self):
        self.sort_index = self.release




smw = Videogame("Super Mario World", "Nintendo", 1990)
mgs = Videogame("Metal Gear Solid", "Sony", 1998, "Solid Snake")
halo = Videogame("Halo", "Microsoft", 2001, "Master Chief")

We have added a default value to our protagonist attribute (protagonist: str = "N/A"); however, we still do not know who the protagonist is on "Super Mario World" (can anyone help us? 😜), but we do know the values for "Metal Gear Solid" and "Halo", so we will include them in the instantiation parameters.  This is what the code above will produce:

python3 videogameDC.py
Videogame(title='Super Mario World', developer='Nintendo', release=1990, protagonist='N/A')
Videogame(title='Metal Gear Solid', developer='Sony', release=1998, protagonist='Solid Snake')
Videogame(title='Halo', developer='Microsoft', release=2001, protagonist='Master Chief')

Notice that the protagonists for Metal Gear Solid and Halo are both set, as we knew about them.  In the case of Super Mario World, the script used the default "N/A" protagonist value from our data class, since we did not provide one.


Data classes have a lot more to offer than just the implementations above, and this was just a primer on this amazing functionality.

I hope that nugget helps you with your Python knowledge, as well as understanding the potential behind data classes.

Thoughts? Comments? tweet @uberdronis.

cheers!


the entire script can be found in the moshpit repo within my github account, directly linked here.

Featured image created by Franki Chamaki, downloaded from unsplash, licensed under the unsplash license.