Friday, March 27, 2009

Python Notes – 9 : Serialization

Welcome to our ninth note in our Python learning process. We talked previously about files and how to handle it but we talked about writing and reading only the primitive data types as integers and strings. We also talked about objects and classes. Now, what if we want to write a compound data type or a complex object to a file. This note will talk about writing objects to files, which is called object serialization.

pickle

The pickle module is a Python built-in module that object serialization and de-serialization. To store a data structure, use the dump method and then close the file in the usual way:

>>> pickle.dump(12.3, f)

>>> pickle.dump([1,2,3], f)

>>> f.close()

Then we can open the file for reading and load the data structures we dumped:

>>> f = open("test.pck","r")

>>> x = pickle.load(f)

>>> x

12.3

>>> type(x)

<type 'float'>

>>> y = pickle.load(f)

>>> y

[1, 2, 3]

>>> type(y)

<type 'list'>

Each time we invoke load, we get a single value from the file, complete with its original type.

What can be serialized and de-serialized

The following types can be serialized and de-serialized using pickle:

  • None, True, and False
  • integers, long integers, floating point numbers, complex numbers
  • normal and Unicode strings
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose__dict__ or __setstate__() is picklable

Things to consider when using pickle

  • Attempts to pickle unpicklable objects will raise the picklingError exception; when this happens, an unspecified number of bytes may have already been written to the underlying file.
  • Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a RuntimeError will be raised in this case. You can carefully raise this limit with sys.setrecursionlimit().

cPickle

The cPickle is an optimized version of pickle written in C, so it can be up to 1000 faster than pickle.

marshal

The marshal module can also be used for serialization. Marshal is similar to pickle, but is intended only for simple objects. Can’t handle recursion or class instances. On the plus side, it’s pretty fast if you just want to save simple objects to a file. Data is stored in a binary architecture independent format.To serialize:

import marshal

marshal.dump(obj,file)                  # Write obj to file

To unserialize:

obj = marshal.load(file)

shelve

The shelve module provides a persistent dictionary. It is works like a dictionary, but data is stored on disk.

Keys must be strings. Data can be any object serializable with pickle.

import shelve

d = shelve.open("data") # Open a ’shelf’

d[’foo’] = 42 # Save data

x = d[’bar’] # Retrieve data

Shelf operations

d[key] = obj              # Store an object

obj = d[key]              # Retrieve an object

del d[key]                 # Delete an object

d.has_key(key)          # Test for existence of key

d.keys()                   # Return a list of all keys

d.close()                  # Close the shelf

In this note will talked about writing objects to files, which is called object serialization. Object serialization is very useful in persisting your application logic to resume its execution later, transfer of execution to remote machine, and many other applications scenarios.

Python Notes – 8 : Object-Oriented Basics

Welcome to our eighth note in our Python learning process. This note will talk about object oriented features in Python.

Classes and Objects

A class definition looks like this:

class Point:

    pass

Class definitions can appear anywhere in a program, but they are usually near the beginning (after the import statements). By creating the Point class, we created a new type, also called Point. The members of this type are called instances of the type or objects. Creating a new instance is called instantiation. To instantiate a Point object, we call a function named Point:

>>> blank = Point()

The variable blank is assigned a reference to a new Point object. A function like Point that creates new objects is called a constructor. If you tried to get the type of blank, you got instance:

>>> type(blank)

<type 'instance'>

If you tried to print blank:

>>> print blank

<__main__.point instance at 0x01922AF8>

The result indicates that blank is an instance of the Point class and it was defined in __main__ . 0x01922AF8 is the unique identifier for this object, written in hexadecimal (base 16).

Attributes

We can add new data to an instance using dot notation:

>>> blank.x = 3.0

>>> blank.y = 4.0

These new data items called attributes.

>>> print blank.y

4.0

>>> x = blank.x

>>> print x

3.0

Sameness

To find out if two references refer to the same object, use the == operator. For example:

>>> p1 = Point()

>>> p1.x = 3

>>> p1.y = 4

>>> p2 = Point()

>>> p2.x = 3

>>> p2.y = 4

>>> p1 == p2

0

Even though p1 and p2 contain the same coordinates, they are not the same object. If we assign p1 to p2, then the two variables are aliases of the same object:

>>> p2 = p1

>>> p1 == p2

1

This type of equality is called shallow equality because it compares only the references, not the contents of the objects. To compare the contents of the objects - deep equality - we can write our own function to do that, like that:

def samePoint(p1, p2) :

    return (p1.x == p2.x) and (p1.y == p2.y)

Now if we create two different objects that contain the same data, we can use samePoint to find out if they represent the same point.

>>> p1 = Point()

>>> p1.x = 3

>>> p1.y = 4

>>> p2 = Point()

>>> p2.x = 3

>>> p2.y = 4

>>> samePoint(p1, p2)

1

Copying

Aliasing can make a program difficult to read because changes made in one place might have unexpected effects in another place. Copying an object is often an alternative to aliasing. The copy module contains a function called copy that can duplicate any object:

>>> import copy

>>> p1 = Point()

>>> p1.x = 3

>>> p1.y = 4

>>> p2 = copy.copy(p1)

>>> p1 == p2

0

>>> samePoint(p1, p2)

1

Copy works fine for objects that doesn't contain any embedded objects. If the object contains references to other objects, Copy will copy the embedded references to the destination. This ends up that the both copies reference the same internal objects.

You can use deepcopy which copies not only the object but also any embedded objects.

>>> b2 = copy.deepcopy(b1)

Now b1 and b2 are completely separate objects.

The initialization method

The initialization method is a special method that is invoked when an object is created. The name of this method is __init__.

class point:

    def __init__(self, x = 0, y = 0):

        Self.x = x

        Slef.y = y

When we invoke the point constructor, the arguments we provide are passed along to init:

>>> first = point(5,7)

>>> first.x

5

>>> first.y

7

Because the parameters are optional, we can omit them:

>>> second = point()

>>> second.x

0

>>> second.y

0

We can also provide a subset of the parameters by naming them explicitly:

>>> third = point(y=10)

>>> third.x

0

>>> third.y

10

The __str__ method

The __str__ method of any class is called by the Python in any operation that requires the class instance to be converted to string. Operations like that are print. Syntax like that:

class xyz:

    def __str__(self):

        return "Our class xyz"

>>> a = xyz()

>>> a

<__main__.xyz instance at 0x02627300>

>>> print y

Our class xyz

Instances as parameters

You can pass an instance as a parameter in the usual way. For example:

def printPoint(p):

    print '(' + str(p.x) + ', ' + str(p.y) + ')'

Instances as return values

Functions can return instances. For example:

def sumPoints(A,B)

    Z = Point ()

    Z.x = A.x + B.x

    Z.y = A.y + B.y

    return Z

Operator overloading

Operator overloading means changing the definition and behavior of the built-in operators when they are applied to user-defined types. For example, to override the addition operator + , we provide a method named __add__ in our point class :

class Point:

    def __add__(self, other):

        return Point(self.x + other.x, self.y + other.y)

the first parameter is the object on which the method is invoked. The second parameter is conveniently named other to distinguish it from self. Now, when we apply the + operator to Point objects, Python invokes add :

>>> p1 = Point(3, 4)

>>> p2 = Point(5, 7)

>>> p3 = p1 + p2

>>> print p3

(8, 11)

The expression p1 + p2 is equivalent to p1. add (p2), but obviously more elegant. You can change the behavior of many operators through overloading their respective functions, which are available at http://www.python.org/doc/2.2/ref/numeric-types.html

Inheritance

Inheritance is the ability to define a new class that is a modified version of an existing class. The new class inherits all of the methods of the existing class. The new class may be called child class or subclass. The syntax is like:

class class1(object):

    K = 7

    def __init__(self, color='green'):

        Self.color = color

    def Hello1(self):

        Print "Hello from class1"

    def printColor(self):

        print "preferred ", self.color

class class2(class1):

    def Hello2(self):

    print "Hello from class2"

    print self.k

Here class2 is the child of class1.

>>> c1 = class1('blue')

>>> c2 = class2('red')

>>> c1.Hello1()

Hello from class1

>>> c2.Hello2()

Hello from class2

7

Child class can access parent class methods

>>> c2.Hello1()

Hello from class1

The parent constructor called automatically for Childs, as following:

>>> c1.printColor()

preferred blue

>>> c2.printColor()

preferred red

You can check for class methods, attributes using hasattr method:

if hasattr(class1, "Hello2"):

    print c1.Hello2()

else:

    print "Class1 does not contain method Hello2()"

Class1 does not contain method Hello2()

To check the inheritance relation between two class :

if issubclass(class2, class1)

     Print "Class2 is a subclass of Class1”

In this note we tried to cover as much as we can of the Python object oriented features. We give it a more advanced note in the future.