Wednesday, June 26, 2013

Redis 101

Redis (REmote DIctionary Service) is an open-source networked in-memory key-value data store. First released in 2009, currently sponsored by VMware, and since then it have been ranked the most popular key-value store by DB-Engine. Redis creator Salvatore Sanfilippo refers to his project as a “data structure server” to capture its unique handling of complex data types and other features. Interested ? Enough talking, let’s get down to business.

Download Redis installer from here.

Installation is straight forward:

Setup - Redis_2013-06-27_11-42-13

Setup - Redis_2013-06-27_11-42-21

Setup - Redis_2013-06-27_11-42-25

Setup - Redis_2013-06-27_11-42-29

Setup - Redis_2013-06-27_11-42-32

Setup - Redis_2013-06-27_11-42-36

Setup - Redis_2013-06-27_11-45-05

Start the Server go to Control Panel >> Administrative Tools >> Services , right click Redis Server and select Start.

Services_2013-06-27_11-49-50

Start the Client look for Redis Client (with red cube icon) in your start menu and open it. As you see it listens to port 6379 by default. You could ping your server by writing PING, if everything is good you will get PONG in return.

Redis Client_2013-06-27_11-56-46

GET & SET

As we said before, Redis is a key-value store. We store value inside a key and retrieve it later using that key. We can use the command SET to store the value "fido" at key "server:name" (server replies with Ok if everything is good) (Note that we used colons [:] within our key. This is a valid character that often logically separates a key into segments. It’s merely a matter of convention, with no deeper meaning in Redis.)

 SET server:name "fido"
To retrieve our value, we use GET command (server replies with “fido”)
 GET server:name 
You could also delete a given key and its associated value using DEL command and you can check for key existence using EXISTS (1 exists, 0 otherwise). We get and set multiple values in one command using MSET & MGET
 MSET 1 valueA 2 valueB
 MGET 1 2

Although Redis stores strings, it recognizes integers and provides some simple operations for them. If we want to keep a running total of something, we can create a count and then increment it with the INCR command (gives error for noninteger values).

 SET count 2
 INCR count
 GET count
Redis Client_2013-06-27_13-01-40

You can also increment by any integer (INCRBY) or decrement (DECR, DECRBY).


Redis Client_2013-06-27_13-04-08


It is good to know that these simple operations are atomic in Radis.


Expire


You can tell Radis to hold a key only for a certain period of time (in seconds) using EXPIRE command. You can test how long a key will exist for with the TTL command. It returns the number of seconds until it will be deleted (-1 means the key will never expire). All keys are permanent by default.


Redis Client_2013-06-27_13-40-01 


Setting and expiring keys is so common that Redis provides a shortcut command called SETEX. Also at any moment before the key expires, you can remove the timeout by running PERSIST KeyName.


A common trick for keeping only recently used keys is to update the expire time whenever you retrieve a value. This will ensure that your most recently used keys will remain in Redis, while the least recently used keys will just expire as normal.


Transactions


Redis like most DBMSs supports transactions. We begin the transaction with the MULTI command and execute it with EXEC command. So, wrapping two operations like SET and INCR in a single block will complete either successfully or not at all.


Redis Client_2013-06-27_13-17-56


As you may guess from the server response inside the transaction, commands are not executed instantly. Instead, they are queued and then executed in sequence.


Similar to ROLLBACK in SQL, you can stop a transaction with the DISCARD command, which will clear the transaction queue. Unlike ROLLBACK, it won’t revert the database; it will simply not run the transaction at all. The effect is identical, although the underlying concept is a different mechanism (transaction rollback vs. operation cancellation).


Complex Data Types


Radis can store lists, hashes, sets, and sorted sets natively. These collection data types can contain a huge number of values (up to 2^32 elements or more than 4 billion) per key. That’s more than enough for all Facebook accounts to live as a list under a single key.


List


A list is a series of ordered values that can act both as queues (first value in, first value out) and as stacks (last value in, first value out). Some of the important commands for interacting with lists are RPUSH, LPUSH, LLEN, LRANGE, LPOP, and RPOP. You can immediately begin working with a key as a list, as long as it doesn't already exist as a different type.



  • RPUSH puts the new value at the end (right) of the list.

  • LPUSH puts the new value at the start (left) of the list.

Redis 101 - Windows Live Writer_2013-06-27_14-42-33



  • LRANGE gives a subset of the list. It takes the index of the first element you want to retrieve as its first parameter and the index of the last element you want to retrieve as its second parameter. A value of -1 for the second parameter means to retrieve all elements in the list.

Redis Client_2013-06-27_14-46-47



  • LLEN returns the current length of the list.

  • LPOP removes the first (left) element from the list and returns it.

  • RPOP removes the last (right) element from the list and returns it.


    • Use LPUSH/RPOP to make List act like a queue.

    • Use LPUSH/LPOP to make List act like a stack.

  • LREM removes (some or all) matching elements from the list. It also requires a number to know how many matches to remove. Setting the count to 0 as we do here just removes them all. Setting the count greater than 0 will remove only that number of matches, and setting the count to a negative number will remove that number of
    matches but scan the list from the end (right side).

Redis Client_2013-06-27_14-53-25



  • RPOPLPUSH a single command for popping values from the tail of one
    list (right pop) and pushing to the head of another (left push). Surprisingly, there is no commands for RPOPRPUSH, LPOPLPUSH, or LPOPRPUSH.

Blocking Lists


It really interesting to know that Redis provides message queuing APIs natively. So, assume you want to write a simple messaging system where multiple clients can push data to one side of the queue and one or more listeners pop data from the other side of the queue. Listeners should just listen for new data and pop them as they arrive.



  • BRPOP the blocking version of RPOP, it blocks the connection when there are no elements to pop from any of the given lists.

Redis Client_2013-06-28_11-04-25


now open another Redis client window and push a message into messages list


Redis Client_2013-06-28_11-04-53


if you switched back to the blocking BRPOP command window, you will find it return the key, the popped value, and the time spent blocking


Redis Client_2013-06-28_11-05-11



  • BLPOP the blocking version of left pop LPOP

  • BRPOPLPUSH the blocking version of right pop left push BRPOPLPUSH

Set


Sets are unordered collections with no duplicate values and are an excellent choice for performing complex operations between two or more key values, such as unions or intersections.



  • SADD adds the given value to the set (duplication is not allowed).

  • SREM removes the given value from the set.

Redis Client_2013-06-27_16-41-05



  • SISMEMBER tests if the given value is in the set.

  • SMEMBERS returns a list of all the members of this set.

  • SUNION combines two or more sets and returns the list of all elements.

  • SINTER returns the intersection of two or more lists.

  • SDIFF returns the elements that exist in the first list only but not in the second list.

Redis 101 - Windows Live Writer_2013-06-27_17-00-54



  • SUNIONSTORE / SIMTERUNION / SDIFF stores the result of operation done on the second and third parameters into the first parameter.

Redis Client_2013-06-27_17-09-00



  • SMOVE moves one element from one set to another.

  • SCARD counts the set elements (set cardinality).

Redis Client_2013-06-27_17-21-00



  • SPOP returns a random element from a set and removes it.

  • SRANDMEMBER returns a random element from a set but does not remove it

Sorted Sets


Sorted set is similar to a regular set, but now each value has an associated score. This score is used to sort the elements in the set. Sorted sets take something from each of the other collections data types. They are ordered like lists and are unique like sets. They have field-value pairs like hashes, but rather than string fields, they are instead numeric scores that denote the order of the values. Internally, sorted sets keep values in order, so inserts can take log(N) time to insert (where N is the size of the set), rather than the constant time complexity of hashes or lists.



  • ZADD add a value with a specified score.

  • ZRANGE returns the elements between start index (zero-based) and end index (-1 means to the end of the sorted set. Put WITHSCORES at the end of command to get elements and scores, not only the elements. To get them in reverse, insert the word REV, as in ZREVRANGE.

Redis Client_2013-06-28_08-51-02



  • ZRANGEBYSCORE since we are using a sorted set, it’s more likely to get elements by score ranges instead of index ranges (x <= score <= y). We can make a score number exclusive by prefixing it with an opening parenthesis ( x < score <= y).

Redis Client_2013-06-28_11-49-34



  • -inf / inf negative / positive infinities. can be used in retrieval (by index or by score). We can also range by both positive and negative values.

  • ZREMRANGEBYRANK / ZREMRANGEBYSCORE remove values by rank or score.

  • ZINCRBY Increments the score of member in the sorted set stored at key by increment. If member does not exist in the sorted set, it is added with increment as its score (as if its previous score was 0.0). If key does not exist, a new sorted set with the specified member as its sole member is created. It is possible to provide a negative value to decrement the score.

Redis Client_2013-06-28_08-58-48



  • ZUNIONSTORE Computes the union of numkeys sorted sets given by the specified keys, and stores the result in destination set. It is mandatory to provide the number of input keys (numkeys) before passing the input keys and the other (optional) arguments. Parameters needed:


    • destination ==> a set to hold the union operation result

    • numkeys ==> number of keys you are going to join.

    • weight ==> optional number(s) to multiply each score of the relative key by (if you have two keys, you can have two weights and son on)

    • aggregate ==> optional rule to specify how the results of the union are aggregated. This option defaults to SUM, where the score of an element is summed across the inputs where it exists. When this option is set to either MIN or MAX, the resulting set will contain the minimum or maximum score of an element across the inputs where it exists.

Redis 101 - Windows Live Writer_2013-06-28_12-29-14


In the above example we prepared two sorted sets, sset1 with elements (1 “one” 2 “two”) , sset2 with elements (1 “one” 2 “two” 3 “three”). Then tell Redis to merge these two sets with the following parameters:




    • store the operation result in sorted set named out

    • we going to merge 2 sorted sets, then provide them sset1 sset2.

    • we going to use WEIGHTS, then provide these weights 2 3.

Hash


Hash is a collections object that can hold any number of key-value pairs.



  • HMSET store the key-value pairs into the hash (first parameter.

  • HVALS returns the values of a hash.

  • HKEYS returns the keys of a hash.

  • HGET returns the value of of a single key in a hash.

Redis Client_2013-06-28_10-09-24



  • HDEL deletes a key-value pair from a hash.

  • HSET updates or adds a value at a specified key in a hash.

  • HINCRBY increments an integer field value by some count in a hash. It is possible to provide a negative value to decrement.

  • HLEN returns the number of fields in a hash.

Redis Client_2013-06-28_10-17-28





Database Namespaces


You may have been wondering, which database we working on. So far, we have interacted only with a single namespace. In Redis’s terminology, a namespace is called a database and is keyed by a number. So far, we’ve always interacted with the default namespace 0 (also known as database 0).


  • Use SELECT command to switch to different database and provide it a DB number. Setting a value to a key in one database will not affect any other key in other databases.

clip_image001



  • Since all databases are running in the same server instance, Redis lets us move keys from one database to another with the MOVE command.

clip_image002


This can be useful for running different applications against a single Redis server but still allow these multiple applications to trade data between each other.

More commands



  • RENAME rename a key

  • TYPE return the type of a key’s value

  • DEL deletes a key-value pair

  • FLUSHDB removes all keys in database

  • FLUSHALL removes all keys from all databases on this server instance.

Redis’s data types and the complex queries it can perform make it much more than a standard key-value store. It can act as a stack, queue, or priority queue; can be an object store (via hashes); and even can perform complex set operations such as unions, intersections, and subtractions (diff). It provides many atomic commands, and for those multistep commands, it provides a transaction mechanism. It has a built-in ability to expire keys, which is useful as a cache.




In this post we just introduced Redis as a data structure server. In later post we going to cover the advanced features of Redis.

Monday, June 24, 2013

PostgreSQL 101

PostgreSQL is the finest open source example of the relational database management system (RDBMS) tradition. This means that PostGreSQL is a design-first data store. First you design the schema, and then you enter data that conforms to the definition of that schema.

History of PostgreSQL

PostgreSQL has existed in the current project incarnation since 1995, but its roots
are considerably older. The original project was written at Berkeley in the early 1970s
and called the Interactive Graphics and Retrieval System, or “Ingres” for short. In the
1980s, an improved version was launched post-Ingres—shortened to Postgres. The
project ended at Berkeley proper in 1993 but was picked up again by the open source
community as Postgres95. It was later renamed to PostgreSQL in 1996 to denote its
rather new SQL support and has remained so ever since.

Installation

You can download the latest PostgreSQL installer from http://www.postgresql.org/download (In this tutorial I’m using the Windows 64bit installer). Setup is straight-forward. Below are screen-shots of installation steps:

Setup_2013-06-24_14-23-08

Setup_2013-06-24_14-23-13

Setup_2013-06-24_14-23-16

remember the superuser password, you will need it later

Setup_2013-06-24_14-23-37

Setup_2013-06-24_14-23-41

Setup_2013-06-24_14-23-44

Setup_2013-06-24_14-23-47

Setup_2013-06-24_14-23-56

when the PostgreSQL installation finish, you could choose to run Stack Builder to install additional tools. In my case I chose to run Stack builder

Setup_2013-06-24_14-25-58

choose your recently installed instance

Stack Builder 3.1.0_2013-06-24_14-26-15

and I choose to install Npgsql database driver (for future .Net development) and phpPgAdmin for administration tasks.

Stack Builder 3.1.0_2013-06-24_14-26-54

Stack Builder 3.1.0_2013-06-24_14-27-00

Stack Builder 3.1.0_2013-06-24_14-27-13

Stack Builder will lunch installation wizards for your chosen tools in sequence

Setup_2013-06-24_14-27-30 

Setup_2013-06-24_14-27-34

Setup_2013-06-24_14-27-36

Setup_2013-06-24_14-27-41

Setup_2013-06-24_14-30-25

Setup_2013-06-24_14-30-37

Setup_2013-06-24_14-30-42

Setup_2013-06-24_14-30-47

Setup_2013-06-24_14-30-53

Setup_2013-06-24_14-31-12

Stack Builder 3.1.0_2013-06-24_14-31-19

Starting with PostgreSQL

To get started, open pgAdmin III from your start menu. On the left hand side right-click on your server instance and click Connect

pgAdmin III_2013-06-24_15-08-22 

enter the password you used in the setup

Connect to Server_2013-06-24_15-08-43

Once you connected to your server, you can right click on the databases node and click New Database

pgAdmin III_2013-06-24_15-09-33

we going to create a simple database called book with the default settings

New Database..._2013-06-24_16-45-31

to start writing queries, you could open Tools menu, then click Query Tool (or press CTRL + E).

Working with Tables

Creating a table consists of giving it a name and a list of columns with types
and (optional) constraint information. Each table should also nominate a
unique identifier column to pinpoint specific rows. That identifier is called a
PRIMARY KEY. The SQL to create a countries table looks like this:

CREATE TABLE countries (country_code char(2) PRIMARY KEY,country_name text UNIQUE);


copy and paste this code in the Query Tool and click Query - book on postgres@localhost5432 _2013-06-24_16-54-24 on the toolbar to run it.


To insert values in our countries table:

INSERT INTO countries (country_code, country_name)
VALUES ('us','United States'), ('mx','Mexico'), ('au','Australia'),
('uk','United Kingdom'), ('de','Germany'), ('ll','Loompaland');


To select from our table:

SELECT * FROM countries;


To delete :

DELETE FROM countries
WHERE country_code = 'll';

Now, let’s add a cities table. To ensure any inserted country_code also exists in our countries table, we add the REFERENCES keyword. Since the country_code column references another table’s key, it’s known as the foreign key constraint.

CREATE TABLE cities (
name text NOT NULL,
postal_code varchar(9) CHECK (postal_code <> ''),
country_code char(2) REFERENCES countries,
PRIMARY KEY (country_code, postal_code)
);

we constrained the name in cities by disallowing NULL values. We constrained postal_code by checking that no values are empty strings (<> means not equal). Furthermore, since a PRIMARY KEY uniquely identifies a row, we created a compound key: country_code + postal_code. Together, they uniquely define a row. now if you try to run this code you will get an error because we violating referential integrity:

INSERT INTO cities
VALUES ('Toronto','M4C1B5','ca');

To update records:

INSERT INTO cities
VALUES ('Portland','87200','us');
UPDATE cities
SET postal_code = '97205'
WHERE name = 'Portland';

Inner Join


Being a relational database gives PostgreSQL the ability to join tables together when reading them. Joining, in essence, is an operation taking two separate tables and combining them in some way to return a single table. It’s somewhat like shuffling up Scrabble pieces from existing words to make new words. The basic form of a join is the inner join. In the simplest form, you specify two columns (one from each table) to match by, using the ON keyword.

SELECT cities.*, country_name
FROM cities INNER JOIN countries
ON cities.country_code = countries.country_code;

Query - book on postgres@localhost5432 _2013-06-25_09-07-07


The join returns a single table, sharing all columns’ values of the cities table plus the matching country_name value from the countries table. We can also join a table like cities that has a compound primary key. To test a compound join, let’s create a new table that stores a list of venues. A venue exists in both a postal code and a specific country. The foreign key must be two columns that reference both cities primary key columns. (MATCH FULL is a constraint that ensures either both values exist or both are NULL.)

CREATE TABLE venues (
venue_id SERIAL PRIMARY KEY,
name varchar(255),
street_address text,
type char(7) CHECK ( type in ('public','private') ) DEFAULT 'public',
postal_code varchar(9),
country_code char(2),
FOREIGN KEY (country_code, postal_code)
REFERENCES cities (country_code, postal_code) MATCH FULL
);

This venue_id column is a common primary key setup: automatically incremented integers (1, 2, 3, 4, and so on…). Creating new row will populate this column automatically. We make this identifier using the SERIAL keyword. CHECK keyword check that a column supplied value against a set of predefined values. DEFAULT provide a default value to be inserted in case user supplied nothing.

INSERT INTO venues (name, postal_code, country_code)
VALUES ('Crystal Ballroom', '97205', 'us');

Joining the venues table with the cities table requires both foreign key columns.

SELECT venues.venue_id, venues.name, cities.name
FROM venues INNER JOIN cities
ON venues.postal_code = cities.postal_code AND venues.country_code = cities.country_code;

Query - book on postgres@localhost5432 _2013-06-25_11-03-08


Outer Join


Outer joins are a way of merging two tables when the results of one table must always be returned, whether or not any matching column values exist on the other table. Let's create a table and populate it for testing this:

CREATE TABLE events (
event_id SERIAL PRIMARY KEY,
title varchar(255),
starts timestamp,
ends timestamp,
venue_id integer,
FOREIGN KEY (venue_id)
REFERENCES venues (venue_id) MATCH FULL
);

INSERT INTO events (title, starts, ends, venue_id)
VALUES ('LARP Club', '2012-02-15 17:30:00', '2012-02-15 19:30:00', 1);
INSERT INTO events (title, starts, ends)
VALUES ('April Fools Day', '2012-04-01 00:00:00', '22012-04-01 23:59:00');
INSERT INTO events (title, starts, ends)
VALUES ('Christmas Day', '2012-12-25 00:00:00', '2012-04-01 23:59:00');

Now the results of the following two queries showing you the difference between INNER JOIN and OUTER JOIN.

SELECT events.title, venues.name
FROM events JOIN venues
ON events.venue_id = venues.venue_id;

SELECT events.title, venues.name
FROM events LEFT JOIN venues
ON events.venue_id = venues.venue_id;

Finally, there’s the FULL JOIN, which is the union of LEFT and RIGHT; you’re guaranteed all values from each table, joined wherever columns match.


Indexes


RDBMSs uses indexes to reducing disk reads and query optimization (among other things). If we select the title of Christmas Day from the events table, the algorithm must scan every row for a match to return. Without an index, each row must be read from disk to know whether a query should return it. An index is a special data structure built to avoid a full table scan when performing a query. PostgreSQL (like any other RDBMS) automatically creates an index on the primary key, where the key is the primary key value and where the value points to a row on disk. Using the UNIQUE keyword is another way to force an index on a table column. PostgreSQL also creates indexes for columns targeted by FOREIGN KEY constraint.


You can explicitly add a hash index using the CREATE INDEX command, where each value must be unique. btree is the suitable data structure for ranges. For more information, refer to online documentation.

CREATE INDEX events_starts
ON events USING btree (starts);

Aggregate Functions


An aggregate query groups results from several rows by some common criteria. It can be as simple as counting the number of rows in a table or calculating the average of some numerical column. Examples:

SELECT count(title)
FROM events

SELECT min(starts), max(ends)
FROM events

Aggregate functions are useful but limited on their own. If we wanted to count all events at each venue, we could write the following for each venue ID (which is tedious):

SELECT count(*) FROM events WHERE venue_id = 1;
SELECT count(*) FROM events WHERE venue_id = 2;
SELECT count(*) FROM events WHERE venue_id = 3;
SELECT count(*) FROM events WHERE venue_id IS NULL;

Grouping


With GROUP BY, you tell Postgres to place the rows into groups and then perform some aggregate function (such as count()) on those groups.

SELECT venue_id, count(*)
FROM events
GROUP BY venue_id;

Query - book on postgres@localhost5432 _2013-06-25_13-49-47


The GROUP BY condition has its own filter keyword: HAVING. HAVING is like the WHERE clause, except it can filter by aggregate functions (whereas WHERE cannot).

SELECT venue_id
FROM events
GROUP BY venue_id
HAVING count(*) = 1;

Query - book on postgres@localhost5432 _2013-06-25_13-50-56


You can use GROUP BY without any aggregate functions. If you SELECT one column, you get all unique values.
SELECT venue_id FROM events GROUP BY venue_id;
This kind of grouping is so common that SQL has a shortcut in the DISTINCT keyword.
SELECT DISTINCT venue_id FROM events;
The results of both queries will be identical.


Window Functions


Window functions are similar to GROUP BY queries in that they allow you to run aggregate functions across multiple rows. The difference is that they allow you to use built-in aggregate functions without requiring every single field to be grouped to a single row. If we attempt to select the title column without grouping by it, we can expect an error.


Query - book on postgres@localhost5432 _2013-06-25_14-10-42


We are counting up the rows by venue_id, and in the case of LARP Club and Wedding, we have two titles for a single venue_id. Postgres doesn’t know which title to display. Whereas a GROUP BY clause will return one record per matching group value, a window function can return a separate record for each row. Window functions return all matches and replicate the results of any aggregate function. It returns the results of an aggregate function OVER a PARTI TI ON of the result set.


Query - book on postgres@localhost5432 _2013-06-25_14-17-36


Transactions


Transactions ensure that every command of a set is executed. If anything fails along the way, all of the commands are rolled back like they never happened. It’s the all or nothing motto that gives relational databases its consistency capability.


PostgreSQL transactions follow ACID compliance, which stands for Atomic (all ops succeed or none do), Consistent (the data will always be in a good state—no inconsistent states), Isolated (transactions don’t interfere), and Durable (a committed transaction is safe, even after a server crash). Transactions are useful when you’re modifying two tables that you don’t want out of sync.


We can wrap any transaction within a BEGIN TRANSACTION block. To verify atomicity, we’ll kill the transaction with the ROLLBACK command.


Stored Procedures


Every command we’ve seen until now has been declarative (executed on the client side), but you can execute code on the database side also. Example of a stored procedure and how to run it:

CREATE OR REPLACE FUNCTION add_event( title text, starts timestamp,
ends timestamp, venue text, postal varchar(9), country char(2) )
RETURNS boolean AS $$
DECLARE
did_insert boolean := false;
found_count integer;
the_venue_id integer;
BEGIN
SELECT venue_id INTO the_venue_id
FROM venues v
WHERE v.postal_code=postal AND v.country_code=country AND v.name ILIKE venue
LIMIT 1;
IF the_venue_id IS NULL THEN
INSERT INTO venues (name, postal_code, country_code)
VALUES (venue, postal, country)
RETURNING venue_id INTO the_venue_id;
did_insert := true;
END IF;
-- Note: not an “error”, as in some programming languages
RAISE NOTICE 'Venue found %', the_venue_id;
INSERT INTO events (title, starts, ends, venue_id)
VALUES (title, starts, ends, the_venue_id);
RETURN did_insert;
END;
$$ LANGUAGE plpgsql;

SELECT add_event('House Party', '2012-05-03 23:00',
'2012-05-04 02:00', 'Run''s House', '97205', 'us');

Triggers


Triggers automatically fire stored procedures when some event happens, like an insert or update. They allow the database to enforce some required behavior in response to changing data.


Let's create a function that logs any event changes into logs table. First we create the table

CREATE TABLE logs (
event_id integer,
old_title varchar(255),
old_starts timestamp,
old_ends timestamp,
logged_at timestamp DEFAULT current_timestamp
);

Next, we build a function to insert old data into the log. The OLD variable represents the row about to be changed (NEW represents an incoming row, which we’ll see in action soon enough). Output a notice to the console with the event_id before returning. Then we create our trigger to log changes after any row is updated through this function.

CREATE OR REPLACE FUNCTION log_event() RETURNS trigger AS $$
DECLARE
BEGIN
INSERT INTO logs (event_id, old_title, old_starts, old_ends)
VALUES (OLD.event_id, OLD.title, OLD.starts, OLD.ends);
RAISE NOTICE 'Someone just changed event #%', OLD.event_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER log_events
AFTER UPDATE ON events
FOR EACH ROW EXECUTE PROCEDURE log_event();

To test our trigger let's update an event and see the trigger output:


Query - book on postgres@localhost5432 _2013-06-25_15-58-53


and the old event was logged


Query - book on postgres@localhost5432 _2013-06-25_16-00-49


Views


Views are aliased queries that can be queried by its alias like any other table. Creating a view is as simple as writing a query and prefixing it with CREATE VIEW some_view_name AS. Views are good for opening up complex queried data in a simple way. If you want to add a new column to the view, it will have to come from the underlying table.

CREATE VIEW holidays AS
SELECT event_id AS holiday_id, title AS name, starts AS date
FROM events
WHERE title LIKE '%Day%' AND venue_id IS NULL;

SELECT name, to_char(date, 'Month DD, YYYY') AS date
FROM holidays
WHERE date <= '2012-04-01';

The RULE system


The rule system (more precisely speaking, the query rewrite rule system) is totally different from stored procedures and triggers. It modifies queries to take rules into consideration, and then passes the modified query to the query planner for planning and execution. More information can be found here. Example of changing query to allow updating our holidays view:

CREATE RULE update_holidays AS ON UPDATE TO holidays DO INSTEAD
UPDATE events
SET title = NEW.name,
starts = NEW.date
WHERE title = OLD.name;

UPDATE holidays SET date = '6/25/2013' where name = 'Christmas Day';

Crosstab

crosstab(text source_sql, text category_sql) takes two queries to pivot results on one of them on the results of the other. 


  • source_sql is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. It may also have one or more "extra" columns. The row_name column must be first. The category and value columns must be the last two columns, in that order.
  • category_sql is a SQL statement that produces the set of categories. This statement must return only one column. It must produce at least one row, or an error will be generated. Also, it must not produce duplicate values, or an error will be generated.

If this is the first time you use something from the tablefunc module, you may need to install it to your database by running CREATE EXTENSION tablefunc;

create table sales(year int, month int, qty int);
insert into sales values(2007, 1, 1000);
insert into sales values(2007, 2, 1500);
insert into sales values(2007, 7, 500);
insert into sales values(2007, 11, 1500);
insert into sales values(2007, 12, 2000);
insert into sales values(2008, 1, 1000);

select * from crosstab(
'select year, month, qty from sales order by 1',
'select m from generate_series(1,12) m'
) as (
year int,
"Jan" int,
"Feb" int,
"Mar" int,
"Apr" int,
"May" int,
"Jun" int,
"Jul" int,
"Aug" int,
"Sep" int,
"Oct" int,
"Nov" int,
"Dec" int
);

Full-Text Search


Full-text search enables the user to search text columns with parts of the column value, not the exact value like in regular queries.



  • LIKE and ILIKE : LIKE and ILIKE (case-insensitive LIKE) are the simplest forms of text search. LIKE compares column values against a given pattern string. The % and _ characters are wildcards. % matches any number of any characters, and _ matches exactly one character.

    • SELECT title FROM movies WHERE title ILIKE 'stardust%';

      • will return movies with titles like : “Stardust” or “Stardust Memories”

  • Regex : You could write the WHERE part of your queries against text columns in regular expressions. A regular expression match is led by the ~ operator, with the
    optional ! (meaning, not matching) and * (meaning case insensitive). So, to count all movies that do not begin with the, the following case-insensitive query will work. The characters inside the string are the regular expression.

    • SELECT COUNT(*) FROM movies WHERE title !~* '^the.*';

  • Bride of Levenshtein : Levenshtein is a string comparison algorithm that compares how similar two strings are by how many steps are required to change one string into another. Each replaced, missing, or added character counts as a step (Changes in case cost a point too). The distance is the total number of steps away. levenshtein() function is provided by the fuzzystrmatch contrib package.The following query return 3 because we need to replace 2 characters and and add 1 character to make bat into fads.

    • SELECT levenshtein('bat', 'fads');

  • Trigram :  A trigram is a group of three consecutive characters taken from a string. The pg_trgm contrib module breaks a string into as many trigrams as it can. Finding a matching string is as simple as counting the number of matching trigrams. The strings with the most matches are the most similar. It’s useful for doing a search where you’re OK with either slight misspellings or even minor words missing. The longer the string, the more trigrams and the more likely a match.

    • SELECT show_trgm('Avatar');  -- returns {" a"," av","ar ",ata,ava,tar,vat}

  • PostgreSQL have more natural language processing capabilities (TSVector and TSQuery) and even can search by word phonetics (metaphone)
  • You could combine text search function in many interesting ways like the following query that “Get me names that sound the most like Robi n Williams, in order.”

    • SELECT * FROM actors
      WHERE metaphone(name,8) % metaphone('Robin Williams',8)
      ORDER BY levenshtein(lower('Robin Williams'), lower(name));

 


In this post we introduced PostgreSQL, a popular open source RDBMS. In future posts we will introduce more DBMSs.