Friday, March 27, 2009

Python Notes – 9 : Serialization

Welcome to our ninth note in our Python learning process. We talked previously about files and how to handle it but we talked about writing and reading only the primitive data types as integers and strings. We also talked about objects and classes. Now, what if we want to write a compound data type or a complex object to a file. This note will talk about writing objects to files, which is called object serialization.

pickle

The pickle module is a Python built-in module that object serialization and de-serialization. To store a data structure, use the dump method and then close the file in the usual way:

>>> pickle.dump(12.3, f)

>>> pickle.dump([1,2,3], f)

>>> f.close()

Then we can open the file for reading and load the data structures we dumped:

>>> f = open("test.pck","r")

>>> x = pickle.load(f)

>>> x

12.3

>>> type(x)

<type 'float'>

>>> y = pickle.load(f)

>>> y

[1, 2, 3]

>>> type(y)

<type 'list'>

Each time we invoke load, we get a single value from the file, complete with its original type.

What can be serialized and de-serialized

The following types can be serialized and de-serialized using pickle:

  • None, True, and False
  • integers, long integers, floating point numbers, complex numbers
  • normal and Unicode strings
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose__dict__ or __setstate__() is picklable

Things to consider when using pickle

  • Attempts to pickle unpicklable objects will raise the picklingError exception; when this happens, an unspecified number of bytes may have already been written to the underlying file.
  • Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a RuntimeError will be raised in this case. You can carefully raise this limit with sys.setrecursionlimit().

cPickle

The cPickle is an optimized version of pickle written in C, so it can be up to 1000 faster than pickle.

marshal

The marshal module can also be used for serialization. Marshal is similar to pickle, but is intended only for simple objects. Can’t handle recursion or class instances. On the plus side, it’s pretty fast if you just want to save simple objects to a file. Data is stored in a binary architecture independent format.To serialize:

import marshal

marshal.dump(obj,file)                  # Write obj to file

To unserialize:

obj = marshal.load(file)

shelve

The shelve module provides a persistent dictionary. It is works like a dictionary, but data is stored on disk.

Keys must be strings. Data can be any object serializable with pickle.

import shelve

d = shelve.open("data") # Open a ’shelf’

d[’foo’] = 42 # Save data

x = d[’bar’] # Retrieve data

Shelf operations

d[key] = obj              # Store an object

obj = d[key]              # Retrieve an object

del d[key]                 # Delete an object

d.has_key(key)          # Test for existence of key

d.keys()                   # Return a list of all keys

d.close()                  # Close the shelf

In this note will talked about writing objects to files, which is called object serialization. Object serialization is very useful in persisting your application logic to resume its execution later, transfer of execution to remote machine, and many other applications scenarios.

No comments: