Wednesday, August 26, 2009

Wisdom of Experts: Programming in vs. Programming into a language

David Gries (1981)  pointed out that “your programming tools don’t have to determine how you think about programming”. He also made a distinction between programming in a language vs. programming into a language.

Programmers who program “in” a language limit their thoughts to constructs that the language directly supports. If the language tools are primitive, the programmers thoughts will be also be primitive.

Programmers who program “into” a language, first decide what thoughts they want to express, and then they determine how to express thoughts using the tools provided by their specific language.

So, try as much as you can to program into the language you use. If your language lacks constructs that you want to to use or is prone to other kinds of problems, try to compensate for them. Invent your own coding conventions, standards, class libraries, and other augmentations.

Saturday, August 15, 2009

Using Python Scripts with IIS 7

Here are the steps you should make if you want to use python scripts on IIS 7:

  • Please make sure Python is installed properly or refer to Python Notes – 1 : Setup for installation steps.
  • Make sure CGI module is installed in IIS 7.

Control Panel -> Programs -> Program and Features -> Turn Windows features on and off -> Internet Information Services -> World Wide Web Services -> Application Development Features -> CGI module (Ensure that it is selected).

Screenshot Studio capture #1

  • Add web application for Python, In IIS Manager, right click Sites –> Add Web Site..

Screenshot Studio capture #2

  • In the Add Web Site dialog box, set Site name e.g.: PythonTest, and make it pointing to some folder like C:\PythonTest, then click OK

Screenshot Studio capture #4

  • In Features View, open Handler Mappings

Screenshot Studio capture #5

  • On the left pane click Add Script Map ...

Screenshot Studio capture #7

  • In Request path, put "*.py" as the script files extension, In Executable select "C:\Python25\Python.exe %s %s", or change it according to your Python installation path. (The two "%s" after the executable are required for console-based script interpreters but would not be required for an Internet Server API [ISAPI]-based script interpreter). Then give the script mapping an appropriate Name, like Python. Click OK.

Screenshot Studio capture #8

  • Create a test.py into the virtual directory (C:\PythonTest), and copy the following script into it.

print

print 'Status: 200 OK'

print 'Content-type: text/html'

print

print '<HTML><HEAD><TITLE>Python Sample CGI</TITLE></HEAD>' print '<BODY>'

print '<H1>This is a header</H1>'

print '<p>' #this is a comment

print 'See this is just like most other HTML'

print '<br>'

print '</BODY>'

Sunday, May 24, 2009

Introduction to Peer-to-Peer Computing

Peer-to-peer (P2P) has become a buzzword that subsumes concepts used in communication technologies, distributed system models, applications, platforms, etc. It is not a particular initiative, nor architecture or specific technology; it rather describes a set of concepts and mechanisms for decentralized distributed computing and direct peer-to-peer information and data interchange.

So, What is Peer-to-Peer Computing ?

There is no one definition of P2P.

  • P2P is a class of applications that takes advantages of resources available at the edges of the internet.
  • The sharing of computer resources and services by direct exchange between systems.
  • In Peer-to-Peer computing, each computer can serve the function of both client and server. Any computer in the network can initiate questions (as a client), receive questions and transmit data (as a server).

We could conclude these definitions by proposing the following definition.

P2P computing is a distributed computing architecture with the following properties:

  • Resource sharing
  • Dual client/server role of network nodes.
  • Decentralization/Autonomy
  • Scalability
  • Robustness/Self-Organization

Once you understood that P2P is a distributed computing architecture, you might be started asking: Why P2P architecture since we have Client/Server architecture ?

Client/Server architecture is a well known, powerful architecture. In it, the server is the data and functionality source, and clients requests data and processing from it. Although Client/Server architecture is very successful (www [HTTP], FTP, Web services, etc…), it have some issues; which are:

  • Scalability is hard to achieve.
  • Presents a single point of failure.
  • Requires administration.
  • Didn’t utilize the resources at the network edges (clients).

Although many of these issues had been solved with many solutions, P2P tried to address these issues in a new and very different way. The following figure emphasizes the general difference between P2P and client/server architecture models.

P2P

Why P2P Now ?

Since the concepts that underlie the P2P computing model is not a new computing concept, it is reasonable to ask why the P2P model has burst on the scene at this time. This happened due to many recent changes in the computing world:

  • The ubiquity of connected computers has come close to enabling anywhere, anytime access to the Net and its resources.
  • The critical mass of computer users.
  • Improvements in communications bandwidth, still on a fast-track growth curve, make it possible to move large amounts of data and rich media content from one location to another.
  • Today’s PCs are sufficiently robust, in terms of processing power and storage capacity, to handle the extra services required in a P2P environment.
  • the emergence of complementary technologies, including recent advances in wireless and software agents that provide more avenues for interesting P2P applications.

While all of these are necessary conditions for P2P computing, something more is required. History shows that a trigger is needed for a new technology to really take off. In the computer industry this is often referred to as a “killer app” The electronic spreadsheet triggered the proliferation of the PC, and the Mosaic browser triggered the transformation of the Internet into the World Wide Web.

The P2P computing model found its trigger with Napster, followed by Gnutella. Their huge popularity got everyone talking about P2P and helped further stimulate other P2P applications such as Freenet and SETI@home.

P2P Benefits

  • Efficient use of resources.
    • Unused bandwidth, storage, processing power at the edge of the network.
  • Scalability
    • Consumers of resources also donate resources.
    • Aggregate resources grow naturally with utilization.
  • Reliability
    • Replicas of data.
    • Geographic distribution
    • No single point of failure.
  • Ease of administration
    • Nodes are self organize.
    • No need to deploy servers to satisfy demand.
    • Built-in fault tolerance, replication, and load balancing.

Since P2P is not well-defined and it is only defined by set of characteristics and properties that are attributed to P2P systems, we will discuss some of common characteristics of P2P systems. This does not imply that every P2P system has to comply with all of these characteristics or even with a fixed number of them. These are general features that can be used to identify P2P systems.

  1. Structural Characteristics
    1. Decentralization
      1. This includes distributed storage, processing, information sharing, etc..
      2. Advantage 
        1. increased extensibility.
        2. Higher system availability and resilience.
      3. Disadvantage
        1. Difficult to get or maintain a global view of the system state.
        2. System behavior is not deterministic.
        3. Interoperability is a big issue.
    2. Self-organizing
      1. The different system components work together without any central management instance assigning roles and tasks.
      2. Disadvantage
        1. It is difficult to determine the system structure or predict the system behavior as long as no system-wide, governing policies apply.
    3. Fault-tolerance
      1. Since there is no central point of failure.
        1. The system performance variance due to nodes leaves and joins might imply that there is a lack of consistency.
  2. Operational Characteristics
    1. Transparency
      1. This means transparency to the application or user in forms of communications, location, access, replication transparency for data and information.  Also the scale of system should be kept transparent to the user.
      2. This requires a middleware layer.

P2P Applications

P2P is good for:

  • File sharing (Napster, Gnutella, Kazza)
  • Multiplayer games (Unreal Tournament, DOOM)
  • Collaborative applications (ICQ, share whiteboard)
  • Distributed computation (Seti@home)
  • Ad-hoc networks

P2P can be applied in many areas, and there are massive work in many areas to utilize the P2P concepts and techniques.

Friday, May 15, 2009

Python Notes 14: Advanced Network Operations

We have explored the usual issues in network programming, both on client side and server side. In this post we will discuss some advanced topics in network programming.

Half-Open Sockets
Normally, sockets are bidirectional—data can be sent across them in both directions. Sometimes, you may want to make a socket be unidirectional so data can only be sent in one direction. A socket that's unidirectional is said to be a half-open socket. A socket is made half-open by calling shutdown(), and that procedure is irreversible for that socket. Half-open sockets are useful when

  • You want to ensure that all data written has been transmitted. When shutdown() is called to close the output channel of a socket, it will not return until all buffered data has been successfully transmitted.
  • You want to have a way to catch potential programming errors that may cause the program to write to a socket that shouldn't be written to, or read from a socket that shouldn't be read from.
  • Your program uses fork() or multiple threads, and you want to prevent other processes or threads from doing certain operations, or you want to force a socket to be closed immediately.

The socket. shutdown() call is used to accomplish all of these tasks.

The call to shutdown() requires a single argument that indicates how you want to shut down the socket. Its possible values are as follows:

  • 0 to prevent future reads
  • 1 to prevent future writes
  • 2 to prevent future reads and writes

Once shut down in a given direction, the socket can never be reopened in that direction. Calls to shutdown() are cumulative; calling shutdown(0) followed by shutdown(1) will achieve the same effect as calling shutdown(2).

Timeouts

TCP connections can be held open indefinitely, even if there's no traffic flowing across them. Timeouts are useful
for discovering error conditions or communication problems in some instances.

To enable timeout detection on a Python socket, you call settimeout() on the socket, passing it the number of seconds until a timeout is reached. Later, when you make a socket call and nothing has happened for that amount of time, a socket.timeout exception is raised.

Transmitting Strings
One common problem that arises when sending data across the network is that of transmitting variable-length strings. When you read information from a TCP stream you don't know when the sender has finished giving you a piece of data unless you build some sort of indication into your protocol. There are two common approaches to solving this problem:

  • End-of-string identifier
    • Terminate the string with ‘\n’ or NULL
    • Problem: Terminator might occur in the data if we transmit binary data.
    • Solutions:
      • Escape the identifier.
      • Encode data in base64
      • use different if found in data and send the new identifier before the data.
  • Leading fixed-length size indicator
    • Send a constant number of bytes containing the size of the string.
    • The “size” itself could be sent as characters or as binary data, characters are simpler, however you have to pad them to get a constant length.

Using Broadcast Data

When you broadcast a UDP packet, it's sent to all machines
connected to your LAN. The underlying transport, such as Ethernet, will have a special mode that lets you do this without having to repeat the packet for each computer.
On the receiver's side, when a broadcast packet is received, the kernel looks at the destination port number. If it has a process listening to that port, the packet is sent to that process. Otherwise, it's silently discarded. Therefore, simply sending out a broadcast packet will not harm or impact machines that don't have a server listening for it.
Broadcast packets are often used for the following types of activities:

  • Automatic service discovery: For instance, a computer might send out a broadcast packet looking for all print servers of a particular type.
  • Automatic service announcements: A server providing a service for a LAN might periodically broadcast the availability of that service. Clients would listen for those broadcasts.
  • Searching for LAN computers that implement a specific protocol. For instance, a chat program might send out a broadcast packet looking for other people on the LAN with the same chat program. It might then compile a list and present it to the user.

To be able to broadcast data, you need to set the socket option on client and server as follows:

s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)

On the sender, instead of sending to a particular IP, send to ‘<broadcast>’

s.sendto(‘<broadcast>’,123)

In this post we dealt with a few advanced issues in network programming.

Python Notes 13: Network servers

For a client, the process of establishing a TCP connection is a two-step process that includes the creation of the socket object and a call to connect() to establish a connection to the server. For a server, the process requires the following four steps:

  1. Create the socket object.
  2. Set the socket options (optional).
  3. Bind to a port (and, optionally, a specific network card).
  4. Listen for connections.

Example of these steps:

host = '' # Bind to all interfaces
port = 51423
# Step 1 (Create the socket object)
s = socket. socket(socket.AF_INET, socket.SOCK_STREAM)
# Step 2 (Set the socket options)
s.setsockopt(socket.SOL_SOCKET, socket.SOREUSEADDR, l)
# Step 3 (Bind to a port and interface)
s.bind((host, port))
# Step 4 (Listen for connections)
s.listen(5)

Setting and Getting Socket Options
There are many different options that can be set for a socket. For general-purpose servers, the socket option of greatest interest is called SOREUSEADDR. Normally, after a server process terminates, the operating system reserves its port for a few minutes, thereby preventing any other process (even another instance of your server itself) from opening it until the timeout expires. If you set the SOREUSEADDR flag to true, the operating system releases the server port as soon as the server socket is closed or the server process terminates.This is done through:

s.setsockopt(socket.SOL_SOCKET, socket.SOJEUSEADDR, l)

Binding the Socket

The next step is to claim a port number for the server. This process is called binding. To bind to a port, you call:
s.bind((‘’, 111))

The first argument to bind() specifies the IP address to bind to it. It's generally left blank, which means "bind to all interfaces and addresses."

Listening for Connections
The last step before actually accepting client connections is to call listen(). This call tells the operating system to prepare to receive connections. It takes a single parameter, which indicates how many pending connections the operating system should allow to remain in queue before the server actually gets around to processing them.

Accepting Connections

Most servers designed to run indefinitely and service multiple connections, this is usually done with a carefully designed infinite loop. Example:

import socket
host = '' # Bind to all interfaces
port = 51423
s = socket.socket(socket.AF_INET, socket.SOCKJTREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, l)
s.bind((host, port))
print "Waiting for connections..."
s.listen(l)
while l:
    clientsock, clientaddr = s.acceptQ
    print "Got connection from", clientsock.getpeername()
    clientsock.close()

Using User Datagram Protocol

To use UDP on the server, you create a socket, set the options, and bind () just like with TCP However, there's no need for listen () or accept ()—just use recvf rom().
This function actually returns two pieces of information: the received data, and the address and port number of the program that sent the data. Because UDP is connectionless, this is all you need to be able to send back a reply. Example, echo server:

import socket, traceback
host = '' # Bind to all interfaces
port = 51423
s = socket.socket(socket.AF_INET, socket.SOCK_DCRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, l)
s.bind((host, port))
while l:
    try:
        message, address = s.recvfrom(8l92)
        print "Cot data from", address
        s.sendto(message, address)   #Echo it back
    except (Keyboardlnterrupt, SystemExit):
        raise
    except:
        traceback.print_exc()

In this post we have clarified some points in network servers.

Python Notes 12 : Network clients

After we have explored the basics of network programming in brief in the previous post, we will discuss network clients in more details in this post.

Understanding Sockets
Sockets are an extension to the operating system's I/O system that enable communication between processes and machines. It can be treated the same as standard files, with the same interface methods so in many cases, a program need not know whether it's writing data to a file, the terminal, or a TCP connection. While many files are opened with the open () call, sockets are created with the socket () call and additional calls are needed to connect and activate them.

Creating Sockets

For a client program, creating a socket is generally a two-step process.

  1. Create the actual socket object.
  2. Connect the socket to the remote server.

When you create a socket object, you need to tell the system two things:

  • The communication type: the underlying protocol used to transmit data. Examples of protocols include IPv4 (current Internet standard), IPv6 (future Internet standard), IPX/ SPX (NetWare), and AFP (Apple file sharing). By far the most common is IPv4.
  • The protocol family: defines how data is transmitted.
    For Internet communications, which make up the bulk of this book, the communication type is almost always AF_INET (corresponding to IPv4). The protocol family is typically either:
    • SOCK_STREAM for TCP communications or
    • SOCK_DGRAM for UDP communications

For a TCP connection, creating a socket generally uses
code like this:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) To connect the socket, you'll generally need to provide a tuple containing the remote hostname or IP address and the remote port. Connecting a socket typically looks like this:
s.connect(("www.example.com", 80))

Finding the port number

Most operating systems ship with a list of well-known server port numbers which you can query. On windows systems, you can find this file at C:\Windows\System32\drivers\etc\services. To query this list, you need two parameters:

  • A protocol name
  • A port name.

This query is like:

>>>print socket.getservbyname(‘ftp’,’tcp’)

21

You didn't have to know in advance that FTP uses port 80.

Getting Information from a Socket
Once you've established a socket connection, you can find out some useful information from it.

s.getsockname() #Get your IP address and port number

s.getpeername() #Get the remote machine IP address and port number

Socket Exceptions

Different network calls can raise different exceptions when network errors occur. Python's socket module actually defines four possible exceptions:

  • socket.error for general I/O and communication problems.
  • socket.gaierror for errors looking up address information
  • socket.herror for other addressing errors.
  • socket.timeout for handling timeouts that occur after settimeout() has been called on a socket.

Complete Example

The example program takes three command-line arguments: a host to which it will connect, a port number or name on the server, and a file to request from the server. The program will connect to the server, send a simple HTTP
request for the given filename, and display the result. Along the way, it exercises care to handle various types of potential errors.

import socket, sys
host = sys.argv[l]
textport = sys.argv[2]
filename = sys.argv[3]
try:
    s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
except socket.error, e: 
    print "Strange error creating socket: %s" % e 
    sys.exit(l)
    # Try parsing it as a numeric port number.
try:
    port = int(textport)
except ValueError:
    # That didn't work, so it's probably a protocol name.
    # Look it up instead,
try:
    port = socket.getservbyname(textport, 'tcp')
    except socket.error, e:
    print "Couldn't find your port: %s" % e
    sys.exit(i)

try:
    s.connect((host, port))
except socket.gaierror, e:
    print "Address-related error connecting to server: %s" % e
    sys.exit(i)
except socket.error, e:
    print "Connection error: %s" % e
    sys.exit(l)
try:
    s.sendall("CET %s HTTP/1.0\r\n\r\n" % filename)
except socket.error, e:
    print "Error sending data: %s" % e
    sys.exit(i)
while 1:
    try:
        buf = s.recvB048)
    except socket.error, e:
        print "Error receiving data: %s" % e
        sys.exit(l)
    if not len(buf):
        break
    sys.stdout.write(buf)

Using User Datagram Protocol

In UDP there is no sufficient control over how data is sent and received. Working with UDP clients differs than TCP clients in the following:

  • When create the socket ask for SOCKDGRAM
    instead of SOCKSTREAM; this indicates to the operating system that the socket will
    be used for UDP instead of TCP communications.
  • When call socket.getservbyname(), pass ‘udp’ instead of ‘tcp’.

In this post we discussed network clients in a little bit depth. In the next post we will discuss network servers.

Python Notes 11 : Introduction to Network Programming

Network Overview

Python provides a wide assortment of network support.

  • Low-level programming with sockets (if you want to create a protocol).
  • Support for existing network protocols (HTTP, FTP, SMTP, etc...).
  • Web programming (CGI scripting and HTTP servers).
  • Data encoding

Network Basics: TCP/IP

Python’s networking modules primarily support TCP/IP.

  • TCP - A reliable connection-oriented protocol (streams).
  • UDP - An unreliable packet-oriented protocol (datagrams).

TCP is the most common (HTTP, FTP, SMTP, etc...). Both protocols are supported using "sockets".

A socket is a file-like object. Allows data to be sent and received across the network like a file. But it also includes functions to accept and establish connections. Before two machines can establish a connection, both must create a socket

Network Basics: Ports

In order to receive a connection, a socket must be bound to a port (by the server). A port is a number in the range 0-65535 that’s managed by the OS. Used to identify a particular network service (or listener). Ports 0-1023 are reserved by the system and used for common protocols:

  • FTP Port 20
  • Telnet Port 23
  • SMTP (Mail) Port 25
  • HTTP (WWW) Port 80

Ports above 1024 are reserved for user processes.

Socket programming in a nutshell

  • Server creates a socket, binds it to some well-known port number, and starts listening.
  • Client creates a socket and tries to connect it to the server (through the above port).
  • Server-client exchange some data.
  • Close the connection (of course the server continues to listen for more clients).

Socket Programming Example

The socket module

Provides access to low-level network programming functions. The following example is a simple server that returns the current time

import time, socket

s = socket(AF_INET, SOCK_STREAM)#Create TCP socket

s.bind(("",8888))                      #Bind to port 8888

s.listen(5)                                #Start listening

while 1:

    client,addr = s.accept()          #Wait for a connection

    print "Got a connection from ", addr

    client.send(time.ctime(time.time())) #Send time back

    client.close()

Notes: The socket first opened by server is not the same one used to exchange data.Instead, the accept() function returns a new socket for this (’client’ above).listen() specifies max number of pending connections

The following example is the client program for the above time server which connect to time server and get current time.

from socket import *

s = socket(AF_INET,SOCK_STREAM) #Create TCP socket

s.connect(("google.com",8888))       #Connect to server

tm = s.recv(1024)                #Receive up to 1024 bytes

s.close()                             # Close connection

print "The time is", tm

Notes: Once connection is established, server/client communicate using send() and recv(). Aside from connection process, it’s relatively straightforward. Of course, the devil is in the details. And are there ever a LOT of details.

The Socket Module

The socket module used for all low-level networking, creation and manipulation of sockets, and general purpose network functions (hostnames, data conversion, etc...). It’s a direct translation of the BSD socket interface.

Utility Functions

  • socket.gethostbyname(hostname) # Get IP address for a host
  • socket.gethostname() # Name of local machine
  • socket.ntohl(x) # Convert 32-bit integer to host order
  • socket.ntohs(x) # Convert 16-bit integer to host order
  • socket.htonl(x) # Convert 32-bit integer to network order
  • socket.htons(x) # Convert 16-bit integer to network order

Comments: Network order for integers is big-endian. Host order may be little-endian or big-endian (depends on the machine).

The socket(family, type, proto) function creates a new socket object. Family is usually set to AF_INET. Type is one of:

  • SOCK_STREAM          Stream socket (TCP)
  • SOCK_DGRAM           Datagram socket (UDP)
  • SOCK_RAW               Raw socket

Proto is usually only used with raw sockets:

  • IPPROTO_ICMP
  • IPPROTO_IP
  • IPPROTO_RAW
  • IPPROTO_TCP
  • IPPROTO_UDP

Socket methods

  • s.accept()                  # Accept a new connection
  • s.bind(address)          # Bind to an address and port
  • s.close()                    # Close the socket
  • s.connect(address)      # Connect to remote socket
  • s.fileno()                   # Return integer file descriptor
  • s.getpeername()         # Get name of remote machine
  • s.getsockname()    #Get socket address as (ipaddr,port)
  • s.getsockopt(...)        # Get socket options
  • s.listen(backlog)        # Start listening for connections
  • s.makefile(mode)   # Turn socket into a file like object
  • s.recv(bufsize)           # Receive data
  • s.recvfrom(bufsize)    # Receive data (UDP)
  • s.send(string)           # Send data
  • s.sendto(string, address)    # Send packet (UDP)
  • s.setblocking(flag)   #Set blocking or nonblocking mode
  • s.setsockopt(...)      #Set socket options
  • s.shutdown(how)     #Shutdown one or both halves of connection

There are a huge variety of configuration/connection options. You’ll definitely want a good reference at your side

The SocketServer Module

Provides a high-level class-based interface to sockets. Each protocol is encapsulated in a class (TCPServer, UDPServer, etc.). It also provides a series of handler classes that specify additional server behavior.

To create a network service, need to inherit from both a protocol and handler class. Example, the same time server we done before:

import SocketServer

import time

# This class actually implements the server functionality

class TimeHandler(SocketServer.BaseRequestHandler):

    def handle(self):

        self.request.send(time.ctime(time.time()))

# Create the server

server = SocketServer.TCPServer("",8888),TimeHandler)

server.serve_forever()

Notes: The module provides a number of specialized server and handler types. Ex: ForkingTCPServer, ThreadingTCPServer, StreamRequestHandler, etc.

Common Network Protocols

Modules are available for a variety of network protocols:

  • ftplib                FTP protocol
  • smtplib             SMTP (mail) protocol
  • nntplib              News
  • gopherlib          Gopher
  • poplib               POP3 mail server
  • imaplib             IMAP4 mail server
  • telnetlib            Telnet protocol
  • httplib              HTTP protocol

These modules are built using sockets, but operate on a very low-level. Working with them requires a good understand of the underlying protocol. But can be quite powerful if you know exactly what you are doing

The httplib Module

Implements the HTTP 1.0 protocol and can use to talk to a web server.

HTTP in two bullets:

  • Client (e.g., a browser) sends a request to the server

GET /index.html HTTP/1.0

Connection: Keep-Alive

Host: www.python.org

User-Agent: Mozilla/4.61 [en] (X11; U; SunOS 5.6 sun4u)

[blank line]

  • Server responds with something like this:

HTTP/1.0 200 OK

Content-type: text/html

Content-length: 72883

Headers: blah

[blank line]

Data

...

Making an HTTP connection

import httplib

h = httplib.HTTP("www.python.org")

h.putrequest(’GET’,’/index.html’)

h.putheader(’User-Agent’,’Lame Tutorial Code’)

h.putheader(’Accept’,’text/html’)

h.endheaders()

errcode,errmsg, headers = h.getreply()

f = h.getfile()        # Get file object for reading data

data = f.read()

f.close()

You should understand some HTTP to work with httplib.

The urllib Module

A high-level interface to HTTP and FTP which provides a file-like object that can be used to connect to remote servers

import urllib

f = urllib.urlopen("http://www.python.org/index.html")

data = f.read()

f.close()

Utility functions

  • urllib.quote(str)         # Quotes a string for use in a URL
  • urllib.quote_plus(str)    # Also replaces spaces with ’+’
  • urllib.unquote(str)         # Opposite of quote()
  • urllib.unquote_plus(str)  # Opposite of quote_plus()
  • urllib.urlencode(dict)
  • # Turns a dictionary of key=value pairs into a HTTP query-string

Examples

urllib.quote("ebeid@ieee")         #Produces "ebeid%40ieee"

urllib.unquote("%23%21/bin/sh")    #Produces "/bin/sh"

The urlparse Module

Contains functions for manipulating URLs

  • URL’s have the following general format

scheme:/netloc/path;parameters?query#fragment

  • urlparse(urlstring) - Parses a URL into components

import urlparse

t = urlparse.urlparse("http://www.python.org/index.html")

#Produces (’http’,’www.python.org’,’/index.html’,’’,’’,’’)

  • urlunparse(tuple) - Turns tuple of components back into a URL string

url = urlparse.urlunparse((’http’,’www.python.org’,’foo.html’, ’bar=spam’,’’))

# Produces "http://www.python.org/foo.html?bar=spam"

  • urljoin(base, url) - Combines a base and relative URL

urlparse.urljoin("http://www.python.org/index.html","help.html")

# Produces "http://www.python.org/help.html"

In this note we explored horizontally the network programming capabilities of Python. Every single module and topic mentioned here, needs multiple posts to cover it. In the upcoming posts, we will dig into network programming in detail.

Saturday, March 28, 2009

Python Notes – 10 : Threading

Welcome to our tenth note in the Python learning process. In this note we will talk about threading, threads communication and synchronization.

Threads basics

A running program is called a "process". Each process has memory, list of open files, stack, program counter, etc…. Normally, a process executes statements in a single sequence of control-flow.

The following commands create an entirely new process: fork(),system(), popen(), etc… This child process runs independently of the parent. Has own set of resources. There is minimal sharing of information between parent and child.

On the other side, a thread is kind of like a process (it’s a sequence of control-flow). Except that it exists entirely inside a process and shares resources. A single process may have multiple threads of execution. This is extremely useful when an application wants to perform many concurrent tasks on shared data.

Problems with Threads

  • Scheduling : To execute a threaded program, must rapidly switch between threads. This can be done by the user process (user-level threads) or Can be done by the kernel (kernel-level threads).
  • Resource Sharing : Since threads share memory and other resources, you must be very careful. Operation performed in one thread could cause problems in another.
  • Synchronization : Threads often need to coordinate actions because they can get "race conditions" (outcome dependent on order of thread execution). You will often need to use locking primitives (mutual exclusion locks, semaphores, etc...)

Python Threads

Python supports threads on the following platforms : Solaris, Windows, systems that support the POSIX threads library (pthreads).

Thread scheduling is tightly controlled by a global interpreter lock and scheduler. Only a single thread is allowed to be executing in the Python interpreter at once. Thread switching only occurs between the execution of individual byte-codes. Long-running calculations in C/C++ can block execution of all other threads. However, most I/O operations do not block.

Python threads are somewhat more restrictive than in C. Effectiveness may be limited on multiple CPUs (due to interpreter lock). Threads can interact strangely with other Python modules (especially signal handling). Not all extension modules are thread-safe.

The thread module

The thread module provides low-level access to threads, thread creation, and Simple mutex locks.

Creating a new thread

Thread.start_new_thread(func,[args [,kwargs]]) Executes a function in a new thread. Syntax like:

import thread

import time

def print_time(delay):

    while 1:

    time.sleep(delay)

    print time.ctime(time.time())

thread.start_new_thread(print_time,(5,))      # Start the thread

# Go do something else

statements

…….

The function print_time will execute in a separate thread and will continue printing the time every 5 seconds. Python will continue executing our statements also.

Thread termination

Thread silently exits when the function returns. Thread can explicitly exit by calling thread.exit() or sys.exit(). Also uncaught exception causes thread termination (and prints error message). Other threads continue to run even if one had an error.

Simple locks

allocate_lock(). Creates a lock object, initially unlocked. Only one thread can acquire the lock at once. Threads block indefinitely until lock becomes available.

import thread

lk = thread.allocate_lock()

def foo():

    lk.acquire()         # Acquire the lock

    …                     #critical section

    lk.release()         # Release the lock

You might use this if two or more threads were allowed to update a shared data structure.

The main thread

When Python starts, it runs as a single thread of execution. This is called the "main thread." which is on its own and it’s not a big deal. However, if you launch other threads it has some special properties.

Termination of the main thread

If the main thread exits and other threads are active, the behavior is system dependent. Usually, this immediately terminates the execution of all other threads without cleanup. Cleanup actions of the main thread may be limited as well.

Signal handling

Signals can only be caught and handled by the main thread of execution. Otherwise you will get an error (in the signal module). The keyboard-interrupt can be caught by any thread (non-deterministically).

The threading module

The threading module is a high-level threads module that implements threads as classes (similar to Java). It provides an assortment of synchronization and locking primitives. It is built using the low-level thread module.

Creating a new thread (as a class)

When defining threads as classes all you need to supply is the following:

  1. A constructor that calls threading.Thread.__init__(self)
  2. A run() method that performs the actual work of the thread.

A few additional methods are also available

  • t.join([timeout])    # Wait for thread t to terminate
  • t.getName()          # Get the name of the thread
  • t.setName(name)   # Set the name of the thread
  • t.isAlive()             # Return 1 if thread is alive.
  • t.isDaemon()         # Return daemonic flag
  • t.setDaemon(val)   # Set daemonic flag

Example: Inherit from the "Thread" class, provide required methods, and utilize the available methods.

import threading, time

class PrintTime(threading.Thread):

    def __init__(self,interval):

        threading.Thread.__init__(self)         # Required

        self.interval = interval

    def run(self):

        while 1:

        time.sleep(self.interval) 

        print time.ctime(time.time())

t = PrintTime(5)                                   # Create a thread object

t.start()                                             # Start it

Daemon threads

Normally, interpreter exits only when all threads have terminated. However, a thread can be flagged as a daemon thread (runs in background). Interpreter really only exits when all non-daemonic threads exit. You can use this to launch threads that run forever, but which can be safely killed.

Threads synchronization

The threading module provides the following synchronization primitives:

  • Mutual exclusion locks
  • Reentrant locks
  • Conditional variables
  • Semaphores
  • Events

When would you need these threads synchronization mechanisms ?

  • When threads are updating shared data structures.
  • When threads need to coordinate their actions in some manner (events).
The Lock object

Provides a simple mutual exclusion lock. Only one thread is allowed to acquire the lock at once. Most useful for coordinating access to shared data.

import threading

data = [ ]                            # Some data

lck = threading.Lock()            # Create a lock

def put_obj(obj):

    lck.acquire()

    data.append(obj)

    lck.release()

def get_obj():

    lck.acquire()

    r = data.pop()

    lck.release()

    return r

The RLock object

A mutual-exclusion lock that allows repeated acquisition by the same thread. Allows nested acquire(), release() operations in the thread that owns the lock. Only the outermost release() operation actually releases the lock.

import threading

data = [ ]                  # Some data

lck = threading.Lock()  # Create a lock

def put_obj(obj):

    lck.acquire()

    data.append(obj)

    ...

    put_obj(otherobj)   # Some kind of recursion

    ...

    lck.release()

def get_obj():

    lck.acquire()

    r = data.pop()

    lck.release()

    return r

The Condition object

Creates a condition variable. Synchronization primitive typically used when a thread is interested in an event or state change. Could help in the producer-consumer classic problem.

data = []                      # Create data queue and a condition variable

cv = threading.Condition()

# Consumer thread

def consume_item():

    cv.acquire()              # Acquire the lock

    while not len(data):

        cv.wait()             # Wait for data to show up

    r = data.pop()

    cv.release()             # Release the lock

    return r

# Producer thread

def produce_item(obj):

    cv.acquire()            # Acquire the lock

    data.append(obj)

    cv.notify()              # Notify a consumer

    cv.release()           # Release the lock

Semaphores

A locking primitive based on a counter. Each acquire() method decrements the counter. Each release() method increments the counter. If the counter reaches zero, future acquire() methods block. Common use: limiting the number of threads allowed to execute code

sem = threading.Semaphore(5)      # No more than 5 threads allowed

def fetch_file(host,filename):

    sem.acquire()                         # Decrements count or blocks if zero

    ...

    sem.release()                         # Increment count

Events

A communication primitive for coordinating threads. One thread signals an "event" while other threads wait for it to happen.

e = Event()                       # Create an event object

def signal_event():             # Signal the event

     e.set()

def wait_for_event():         # Wait for event

    e.wait()

def clear_event():             # Clear event

    e.clear()

Event is similar to a condition variable, but all threads waiting for event are awakened.

Locks and Blocking

By default, all locking primitives block until lock is acquired. In general, this is uninterruptible. Fortunately, most primitives provide a non-blocking option

if not lck.acquire(0):

     # lock couldn’t be acquired!

This works for Lock, RLock, and Semaphore objects. On the other hand condition variables and events provide a timeout option

cv = Condition()

...

cv.wait(60.0)                  # Wait 60 seconds for notification

On timeout, the function simply returns. Up to caller to detect errors.

The Queue Module

Provides a multi-producer, multi-consumer FIFO queue object. It can be used to safely exchange data between multiple threads.

  • q = Queue(maxsize)    # Create a queue.
  • q.qsize()                   # Return current size.
  • q.empty()                 # Test if empty.
  • q.full()                     # Test if full.
  • q.put(item)               # Put an item on the queue.
  • q.get()                    # Get item from queue

The Queue object also supports non-blocking put/get.

  • q.put_nowait(item).
  • q.get_nowait()

These raise the Queue.Full or Queue.Empty exceptions if an error occurs. Return values for qsize(), empty(), and full() are approximate.

Things to consider when using threads

  • Global interpreter lock makes it difficult to fully utilize multiple CPUs.
  • You don’t get the degree of parallelism you might expect.
  • Not all modules are thread-friendly. Example: gethostbyname() blocks all threads if nameserver down.

In this note we will talked about threading, threads communication and synchronization. In the upcoming notes, we will talk about more advanced topics in Python programming.

Friday, March 27, 2009

Python Notes – 9 : Serialization

Welcome to our ninth note in our Python learning process. We talked previously about files and how to handle it but we talked about writing and reading only the primitive data types as integers and strings. We also talked about objects and classes. Now, what if we want to write a compound data type or a complex object to a file. This note will talk about writing objects to files, which is called object serialization.

pickle

The pickle module is a Python built-in module that object serialization and de-serialization. To store a data structure, use the dump method and then close the file in the usual way:

>>> pickle.dump(12.3, f)

>>> pickle.dump([1,2,3], f)

>>> f.close()

Then we can open the file for reading and load the data structures we dumped:

>>> f = open("test.pck","r")

>>> x = pickle.load(f)

>>> x

12.3

>>> type(x)

<type 'float'>

>>> y = pickle.load(f)

>>> y

[1, 2, 3]

>>> type(y)

<type 'list'>

Each time we invoke load, we get a single value from the file, complete with its original type.

What can be serialized and de-serialized

The following types can be serialized and de-serialized using pickle:

  • None, True, and False
  • integers, long integers, floating point numbers, complex numbers
  • normal and Unicode strings
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose__dict__ or __setstate__() is picklable

Things to consider when using pickle

  • Attempts to pickle unpicklable objects will raise the picklingError exception; when this happens, an unspecified number of bytes may have already been written to the underlying file.
  • Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a RuntimeError will be raised in this case. You can carefully raise this limit with sys.setrecursionlimit().

cPickle

The cPickle is an optimized version of pickle written in C, so it can be up to 1000 faster than pickle.

marshal

The marshal module can also be used for serialization. Marshal is similar to pickle, but is intended only for simple objects. Can’t handle recursion or class instances. On the plus side, it’s pretty fast if you just want to save simple objects to a file. Data is stored in a binary architecture independent format.To serialize:

import marshal

marshal.dump(obj,file)                  # Write obj to file

To unserialize:

obj = marshal.load(file)

shelve

The shelve module provides a persistent dictionary. It is works like a dictionary, but data is stored on disk.

Keys must be strings. Data can be any object serializable with pickle.

import shelve

d = shelve.open("data") # Open a ’shelf’

d[’foo’] = 42 # Save data

x = d[’bar’] # Retrieve data

Shelf operations

d[key] = obj              # Store an object

obj = d[key]              # Retrieve an object

del d[key]                 # Delete an object

d.has_key(key)          # Test for existence of key

d.keys()                   # Return a list of all keys

d.close()                  # Close the shelf

In this note will talked about writing objects to files, which is called object serialization. Object serialization is very useful in persisting your application logic to resume its execution later, transfer of execution to remote machine, and many other applications scenarios.

Python Notes – 8 : Object-Oriented Basics

Welcome to our eighth note in our Python learning process. This note will talk about object oriented features in Python.

Classes and Objects

A class definition looks like this:

class Point:

    pass

Class definitions can appear anywhere in a program, but they are usually near the beginning (after the import statements). By creating the Point class, we created a new type, also called Point. The members of this type are called instances of the type or objects. Creating a new instance is called instantiation. To instantiate a Point object, we call a function named Point:

>>> blank = Point()

The variable blank is assigned a reference to a new Point object. A function like Point that creates new objects is called a constructor. If you tried to get the type of blank, you got instance:

>>> type(blank)

<type 'instance'>

If you tried to print blank:

>>> print blank

<__main__.point instance at 0x01922AF8>

The result indicates that blank is an instance of the Point class and it was defined in __main__ . 0x01922AF8 is the unique identifier for this object, written in hexadecimal (base 16).

Attributes

We can add new data to an instance using dot notation:

>>> blank.x = 3.0

>>> blank.y = 4.0

These new data items called attributes.

>>> print blank.y

4.0

>>> x = blank.x

>>> print x

3.0

Sameness

To find out if two references refer to the same object, use the == operator. For example:

>>> p1 = Point()

>>> p1.x = 3

>>> p1.y = 4

>>> p2 = Point()

>>> p2.x = 3

>>> p2.y = 4

>>> p1 == p2

0

Even though p1 and p2 contain the same coordinates, they are not the same object. If we assign p1 to p2, then the two variables are aliases of the same object:

>>> p2 = p1

>>> p1 == p2

1

This type of equality is called shallow equality because it compares only the references, not the contents of the objects. To compare the contents of the objects - deep equality - we can write our own function to do that, like that:

def samePoint(p1, p2) :

    return (p1.x == p2.x) and (p1.y == p2.y)

Now if we create two different objects that contain the same data, we can use samePoint to find out if they represent the same point.

>>> p1 = Point()

>>> p1.x = 3

>>> p1.y = 4

>>> p2 = Point()

>>> p2.x = 3

>>> p2.y = 4

>>> samePoint(p1, p2)

1

Copying

Aliasing can make a program difficult to read because changes made in one place might have unexpected effects in another place. Copying an object is often an alternative to aliasing. The copy module contains a function called copy that can duplicate any object:

>>> import copy

>>> p1 = Point()

>>> p1.x = 3

>>> p1.y = 4

>>> p2 = copy.copy(p1)

>>> p1 == p2

0

>>> samePoint(p1, p2)

1

Copy works fine for objects that doesn't contain any embedded objects. If the object contains references to other objects, Copy will copy the embedded references to the destination. This ends up that the both copies reference the same internal objects.

You can use deepcopy which copies not only the object but also any embedded objects.

>>> b2 = copy.deepcopy(b1)

Now b1 and b2 are completely separate objects.

The initialization method

The initialization method is a special method that is invoked when an object is created. The name of this method is __init__.

class point:

    def __init__(self, x = 0, y = 0):

        Self.x = x

        Slef.y = y

When we invoke the point constructor, the arguments we provide are passed along to init:

>>> first = point(5,7)

>>> first.x

5

>>> first.y

7

Because the parameters are optional, we can omit them:

>>> second = point()

>>> second.x

0

>>> second.y

0

We can also provide a subset of the parameters by naming them explicitly:

>>> third = point(y=10)

>>> third.x

0

>>> third.y

10

The __str__ method

The __str__ method of any class is called by the Python in any operation that requires the class instance to be converted to string. Operations like that are print. Syntax like that:

class xyz:

    def __str__(self):

        return "Our class xyz"

>>> a = xyz()

>>> a

<__main__.xyz instance at 0x02627300>

>>> print y

Our class xyz

Instances as parameters

You can pass an instance as a parameter in the usual way. For example:

def printPoint(p):

    print '(' + str(p.x) + ', ' + str(p.y) + ')'

Instances as return values

Functions can return instances. For example:

def sumPoints(A,B)

    Z = Point ()

    Z.x = A.x + B.x

    Z.y = A.y + B.y

    return Z

Operator overloading

Operator overloading means changing the definition and behavior of the built-in operators when they are applied to user-defined types. For example, to override the addition operator + , we provide a method named __add__ in our point class :

class Point:

    def __add__(self, other):

        return Point(self.x + other.x, self.y + other.y)

the first parameter is the object on which the method is invoked. The second parameter is conveniently named other to distinguish it from self. Now, when we apply the + operator to Point objects, Python invokes add :

>>> p1 = Point(3, 4)

>>> p2 = Point(5, 7)

>>> p3 = p1 + p2

>>> print p3

(8, 11)

The expression p1 + p2 is equivalent to p1. add (p2), but obviously more elegant. You can change the behavior of many operators through overloading their respective functions, which are available at http://www.python.org/doc/2.2/ref/numeric-types.html

Inheritance

Inheritance is the ability to define a new class that is a modified version of an existing class. The new class inherits all of the methods of the existing class. The new class may be called child class or subclass. The syntax is like:

class class1(object):

    K = 7

    def __init__(self, color='green'):

        Self.color = color

    def Hello1(self):

        Print "Hello from class1"

    def printColor(self):

        print "preferred ", self.color

class class2(class1):

    def Hello2(self):

    print "Hello from class2"

    print self.k

Here class2 is the child of class1.

>>> c1 = class1('blue')

>>> c2 = class2('red')

>>> c1.Hello1()

Hello from class1

>>> c2.Hello2()

Hello from class2

7

Child class can access parent class methods

>>> c2.Hello1()

Hello from class1

The parent constructor called automatically for Childs, as following:

>>> c1.printColor()

preferred blue

>>> c2.printColor()

preferred red

You can check for class methods, attributes using hasattr method:

if hasattr(class1, "Hello2"):

    print c1.Hello2()

else:

    print "Class1 does not contain method Hello2()"

Class1 does not contain method Hello2()

To check the inheritance relation between two class :

if issubclass(class2, class1)

     Print "Class2 is a subclass of Class1”

In this note we tried to cover as much as we can of the Python object oriented features. We give it a more advanced note in the future.

Monday, March 23, 2009

Python Notes – 7 : Files & directories

Welcome to our seventh note in our Python learning process. This note will talk specifically about files, directories, and exceptions.

Files

Opening a file creates a file object. Syntax is like that:

>>> f = open("test.dat","w")

>>> print f

<open file 'test.dat', mode 'w' at fe820>

The first parameter to open is the file name, the second parameter is the mode. Modes are: w for write, r for read.

To write data in the file we invoke the write method on the file object:

>>> f.write("Now is the time")

>>> f.write("to close the file")

After we done we can close the file like that:

>>> f.close()

The read method reads data from the file. With no arguments, it reads the entire contents of the file:

>>> text = f.read()

>>> print text

Now is the time to close the file

read can also take an argument that indicates how many characters to read.

If not enough characters are left in the file, read returns the remaining characters. When we get to the end of the file, read returns the empty string:

>>> print f.read(5)

Now i

>>> print f.read(1000006)

s the timeto close the file

>>> print f.read()

>>>

The write method write data to the file. It takes strings only.

>>> x = 52

>>> f.write (str(x))

Directories

When you create a new file by opening it and writing, the new file goes in the current directory (wherever you were when you ran the program). Similarly, when you open a file for reading, Python looks for it in the current directory. If you want to open a file somewhere else, you have to specify the path to the file, which is the name of the directory (or folder) where the file is located:

>>> f = open("/usr/share/dict/words","r")

>>> print f.readline()

Whatever exist

glob module

Returns filenames in a directory that match a pattern.

import glob

a = glob.glob("*.html")

b = glob.glob("image[0-5]*.gif")

Pattern matching is performed using rules of Unix shell. Tilde (~) and variable expansion is not performed.

fnmatch module

Matches filenames according to rules of Unix shell.

import fnmatch

if fnmatch(filename,"*.html"):

...

Case-sensitivity depends on the operating system.

Other File-Related Modules

  • fcntl : Provides access to the fcntl() system call and file-locking operations

import fcntl, FCNTL

fcntl.flock(f.fileno(),FCNTL.LOCK_EX) # Lock a file

  • tempfile : Creates temporary files
  • gzip : Creates file objects with compression/decompression. Compatible with the GNU gzip program.

import gzip

f = gzip.open("foo","wb")

f.write(data)

Exceptions

Whenever a runtime error occurs, it creates an exception. Usually, the program stops and Python prints an error message.

>>> print 55/0

ZeroDivisionError: integer division or modulo

We can handle the exception using the try and except statements.

filename = raw_input('Enter a file name: ')

try:

f = open (filename, "r")

except:

print 'There is no file named', filename

The try statement executes the statements in the first block. If no exceptions occur, it ignores the except statement. If any exception occurs, it executes the statements in the except branch and then continues.

If your program detects an error condition, you can make it raise an exception.

def inputNumber () :

x = input ('Pick a number: ')

if x == 17 :

raise 'BadNumberError', '17 is a bad number'

return x

The raise statement takes two arguments: the exception type and specific

information about the error. And here how the previous example appears:

>>> inputNumber ()

Pick a number: 17

BadNumberError: 17 is a bad number

This note talked about files, directories, and exceptions. The next note will go into the object oriented features of Python.