5.5. Classes and class attributes

5.5.1. Why classes?

The motivation for classes is the same as the motivation for data types. There is a set of computations that have similar subparts and similar data storage needs, and therefore we want uniform ways of computing them. Thus, all containers contain elements and the in operator relates all containers to their elements; all lists are mutable sequences, so their is a shared append method used to add elements at the end. In general, we want objects with shared properties to share methods.

It often happens that in writing a module, a collection of functions organized around one task, that we can think of the task as revolving around some kind of computational object. There are two related reasons for analyzing the problem in these terms:

  1. There is a shared bundle of information involved in each computation.

  2. Once we have such a bundle of attributes, there are ways to compute further useful information.

In that case, we can think of the bearer of the bundle of information as an instance of a class of objects, and the various types of computation as methods of that class of objects. Now reasons 1 and 2 are also reasons for writing a function which makes the bundle of information arguments of the function. And since there is some overhead involved in creating classes, we generally only want classes when reason 2 involves multiple computations. That is, when we have a set of different kinds of things that need to be computed, all involving the same, or nearly the same, bundle of attributes. The motivations grow even stronger when there is some notion of state involved; that is, features of the bundle of information may change over time, as we perform different computations, but we still want essentially the same methods to apply. We then want some representation of that bundle that can persist over time.

The most transparent examples are those that correspond to objects in the world.

Suppose we wanted to represent some information about a person. We might do this with a dictionary, as we did in Section Dictionaries:

1gvr =   dict(name='Guido can Rossum',
2             language = 'Python',
3             favorite_tv_show = 'Monty Python',
4             favorite_desert = 'Dutch Apple pie',
5             email_address = 'gvr@gmail.com')

An alternative might be to make use of a Person class, to be defined below, bundling all this information together into class attributes. Then we would accomplish the same task with the following sort of statemnent:

1gvr = Person(name='Guido can Rossum',
2             language = 'Python',
3             favorite_tv_show = 'Monty Python',
4             favorite_desert = 'Dutch Apple pie',
5             email_address = 'gvr@gmail.com')

If this were the only kind of information that needed to be represented, the decision to use a class definition would basically be a matter of taste. But suppose there were certain computational tasks associated with people that we routinely wanted to get done, such as sending them email. If gvr is a class, that kind of functionality would be possible; for a basic Python dictionary, it would require something a little convoluted. For the class case, we would do something like:

gvr.send_email('How about lunch?','mark.gawron@gmail.com')

As a more computational example consider a webserving program. A webserver opens dialogues with many clients all over the web and may need to hold many conversations simultaneously. Each client has a similar bundle of information, with different values instantiated. For example, there may be a URL, authentication information, a user name, a pending request, and so on. A very natural way to implement this sort of functionality would be through a class with methods like send_request and respond.

5.5.2. A small example

Some of the basic features of class definitions can be illustrated with a simple case in which there are two basic attributes. Consider some application in which there is a plane (perhaps a display, perhaps a plot of some data) and points to be placed on it. Each point will have an x, y coordinate, so x and y will be our attributes:

 1import math
 2
 3class Point:
 4
 5    def __init__(self, x, y):
 6        self.x = x
 7        self.y = y
 8
 9    def __str__(self):
10        return "Point(%d, %d)" % (self.x, self.y)
11
12    def distance_from_origin (self):
13        return math.sqrt(self.x**2 + self.y**2)
14
15    def distance(self, p2):
16        return math.sqrt((p2.x - self.x)**2 + (p2.y - self.y)**2)

There is a class definition statement class Point followed by a block of code in which there are what appear to be four function definitions. These are the methods of the class. The definition specifies of a class of objects, generally used to create instances with individually varying properties, but all alike enough so that it will make sense to define shared methods for all of them. The first two methods above have special names recognized by Python. The __init_ method will be executed whenever an instance of class Point is created. There are three arguments; the one named self refers to the point instance being created; the other two must be supplied when a point is created (as shown below) and will be the coordinates of the point. The __str__ method will be executed whenever a Point needs to be printed. The third is just a user defined method specific to points. To illustrate:

 1>>> p1 = Point(3,4)
 2>>> p2 = Point(1,1)
 3>>> p1.x
 43
 5>>> p2.x
 61
 7>>> print p1
 8Point(3,4)
 9>>> print p2
10Point(1,1)
11>>> p1.distance_from_origin()
125.0
13>>> p2.distance_from_the_origin()
141.4142135623730951
15>>> p1.distance(p2)

In line 1, a point is created. As with basic Python data types like string and list, the name of the class is also used a function for creating instances of the class. The new wrinkle is that the particulars of what should happen when a Point is created are specified in the __init__ method. The __init__ method say that the two arguments x and y fill the x and y attributes of the point instance being created. In lines 3 and 5 the x attributes are retrieved for two different point instances, and we see that different Point instances have different attribute values.

An important property of all the method definitions in the Point class is that all have a parameter named self. The self parameter always refers to the point instance calling the method, and that in turn can be used to access the bundle of information associated with that instance. Thus, when p1 calls the distance_from_the_origin method, it looks like a function call with no arguments, but it is really a function call with one argument; p1, written to the left of the ., is used as the value of self parameter. Since the x and y coordinates of p1 are used in computing the distance from the origin, and since p1 and p2 have different coordinates, we get different distances computed for p1 and p2. In general, methods have a self parameter filled by the object instance calling the method, and that parameter must be supplied to the left of the period when the method is called. The official requirement is that the first parameter be the object instance calling the method, but the convention of naming that parameter self is almost universally followed, and following that convention is highly recommended.

The Person class assumed in the example above would look like this:

 1import smtplib
 2from email.mime.text import MIMEText
 3
 4class Person:
 5
 6    def __init__(self, name,language,favorite_tv_show,favorite_desert,
 7                 email_address):
 8        self.name = name
 9        self.language = language
10        self.favorite_tv_show = favorite_tv_show
11        self.email_address = email_address
12
13    def send_email(self, msg, sender, subject=''):
14
15        msg = MIMEText(msg)
16        msg['Subject'] = subject
17        msg['From'] = sender
18        msg['To'] = self.email_address
19
20        s = smtplib.SMTP('localhost')
21        s.sendmail(sender, [self.email_address], msg.as_string())
22        s.quit()

Note

The code above assumes there is an SMTP server running on the local machine; most personal computers won’t be running one)

5.5.3. Classes, namespaces, and attributes

There are a lot of similarities between functions and class methods. Both methods and functions can be called ; they can be told to execute a program by writing their names followed by parentheses (). Hence, they’re called callables.

But the similarity in Python goes a little further. Notice the similarity of the syntax used in calling functions from within a module and methods of a class.

After importing the module collections, to call the Counter function within the collections module we do:

>>> collections.Counter()

After defining the class Point, and creating an instance p1, to call the distance_from_origin method for p1, we do:

>>> p1.distance_from_origin()

In both cases we have the syntax:

<name>.<callable>()

In fact, in both cases the syntax signals that name defines a private namespace. A module’s initial namespace contains all the functions, classes, and global variables defined in it. A class instance’s initial namespace contains all the methods and attributes defined in the class definition. Hence, forgetting to access the right namespace by not using the name.callable() syntax raises the same kind of error in both cases:

1>>> p1 = Point(1,4)
2>>> distance_from_origin(p1)
3...
4NameError: name 'distance_from_origin' is not defined
5>>> import collections
6>>> Counter()
7...
8NameError: name 'distance_from_origin' is not defined

Hence two different classes can use the same method name with different definitions, and there is no name clash.

The names used in namespaces do not have to refer to callables. Any name denoting any kind of Python object can belong to a namespace. The general term for a name in a namespace, whether it refers to to a callable or not, is attribute. Python modules can come with predefined names that refer to all kinds of noncallables. For example, the Python module string defines a number of useful character sets as attributes:

1>>> import string
2>>> string.ascii_letters
3'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
4>>> string.punctuation
5'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The names in a namespace can always be added to by a simple assignment. Although it is not clear why one would do this, one can simply redefine an existing variable or define a new variable within an imported module as follows:

>>> string.ascii_letters = ''
>>> string.roman_numeral_chars = 'ivxlcm'

More naturally, one might choose to store particular attributes on an instance. Our __init__ method for creating instances of the class Point has two arguments x and y, and makes the x coordinate and y coordinate attributes of the point being created with the lines:

self.x = x
self.y = y

Here, self is the name of the Point instance being created, and x and y the names of the attributes we want to store the x and y values in. We can access the attributes of a point instance the same way we access a callable attribute, but without the parentheses. Hence:

 1>>> p1 = Point(3,4)
 2>>> p1.x
 33
 4>>> p1.y
 54
 6>>> p2 = Point(1,4)
 7>>> p2.x
 81
 9>>> p2.y
104
11>>> p1.x == p2.x
12False
13>>> p1.y == p2.y
14True

Point attributes don’t have to be universal properties of every instance of the class. For instance, if some points, but not all, have colors, we might say of one such point:

>>> p1.color = 'red'

>>> p1.color
'red'

In this case, the attribute name color is unique to p1’s namespace, and if we ask for the color of another point, we don’t get a name error, but we get a very closely related error called an attribute error:

>>> p2. color
...
AttributeError: 'Point' object has no attribute 'color'

That is, the name color is not defined in p2’s namespace. We get the same kind of error if we access the namespace of a module with an unknown function:

>>> collections.foo
...
AttributeError: 'module' object has no attribute 'foo'

We know how to define callables that belong to a particular namespace. For a function foo belonging to a module mod, we just write:

def foo ():
    [dfn goes here]

in the file mod.py defining the module.

For a method foo belonging to a class Cls, we just write the method definition of foo (properly indented) in the block of code defining the class:

class Cls:

    def foo(self):
        [dfn goes here]

How do we place other kinds of attributes that aren’t callable in particular namespaces? The answer is, with simple variable assignments. Thus, the string module is defined in a file string.py and that file contains the variable assignments:

ascii_letters =  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
punctuation =  '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

Hence when you import the string module, the names string.ascii_letters and string.punctuation are available with the above values.

Similarly if we want all points to be instantiated with the color “yellow” as an attribute, we define the class as follows:

 1class Point:
 2
 3     color = "yellow"
 4
 5     def __init__(self, x, y):
 6         self.x = x
 7         self.y = y
 8
 9     def __str__(self):
10         return "Point(%d, %d)" % (self.x, self.y)
11
12     def distance_from_origin (self):
13         return math.sqrt(self.x**2 + self.y**2)
14
15     def distance(self, p2):
16         return math.sqrt((p2.x - self.x)**2 + (p2.y - self.y)**2)

Note that we can still set the color of individual points and have the following sorts of points:

1>>> p1 = Point(3,4)
2>>> p2 = Point(1,1)
3>>> p1.color = "red"
4>>> p1.color
5"red"
6>>> p2.color
7"yellow"

5.5.4. Census data example