Document-Oriented Persistence with CouchDB

… a scalable, NoSql database

MSc Project Report: Document-Oriented Persistence with CouchDB

leave a comment »

So here, finally, is my Project Report for my MSc Computer Science project at Birkbeck, University of London.

MSc Project Report: Document-Oriented Persistence with CouchDB

MSc Project Report: Document-Oriented Persistence with CouchDB

I did the project on non-relational (so-called ‘NoSql’) databases, with a particular focus on CouchDB. The most interesting thing about CouchDB is that it combines, in the same product, a web server with a database. It can store data as well as serve web pages. This enables it to do very interesting things such as the replication of data and code between servers, desktop PCs, and mobile devices.

Here’s a useful 2-minute video introduction to CouchDB.

To understand CouchDB more deeply, read the definitive guide. That was my starting point for the project.

CouchDB logo

Written by klena02

September 22, 2010 at 1:47 pm

Posted in CouchDB

Three Steps to CouchDB Heaven …

leave a comment »

… just follow and you will plain-lee see!

Well, actually there are way more than three steps, but today, we’ll keep it to three, in honour of the King of Rock and Roll who died this day in 1977, aged 42.

Step 1: Find some JSON documents in need of some CouchDB love

So, let’s say you have some JSON documents, all in a folder, like this:

JSON Documents in a Folder

The documents contain valid JSON. The best way to verify this is using http://www.jsonlint.com:

So anyway, these documents aren’t happy. They have so much to offer to the world, yet here they are languishing on your hard disk.

Let’s cheer them up a little by putting them on the Couch.

Now, if you haven’t got your “CouchDB in the cloud” yet stop reading this and go to http://www.couch.io/get and get one. You can also get one from the similarly excellent http://cloudant.com.

One tip. When you create your cloud instance of CouchDB, be sure not to leave it in “admin party” mode. Secure it with a username and password instead.

Step 2: PUT your documents on the Couch

So, how do we get your love-sick documents off your disk and up and into the heavens?

Well, we know that we can speak with CouchDB at any time using HTTP. To use HTTP from the command line, we can use the curl command. If you are using windows, you can install curl for windows: http://curl.haxx.se/download.html.

I’m using Ubuntu Linux because it’s free and it’s awesome.

To create a database on your new cloud-based CouchDB, you need to do this from the command line or terminal window:

curl -X PUT http://username:password@mick.couchone.com/universities

To put a single document on to the database, do this:

curl -X PUT http://username:password@mick.couchone.com/universities/test -d @test.json

Note a couple of things: you need to replace username, password, mick, universities, test with whatever your own variables are.

Also, please note that I used PUT not POST. Why? Well, we use PUT when we already know the _id of the JSON document. If you want CouchDB to assign a system-generated _id to your document, that’s when you use POST instead. More details here: http://wiki.apache.org/couchdb/HTTP_Document_API#PUT.

… all being well, the server replies as follows:

{"ok":true,"id":"test","rev":"1-3b717529ff0f515c2c5d8aa52a2c03ab"}

… and we can go and look at our test document here:

http://mick.couchone.com/universities/test

To upload an entire folder to the Couch, I had to learn some bash scripting (on windows the same would be achieved with a windows batch file script).

#!/bin/bash
FILES=./universities/*
for filepath in $FILES
do
# get the file name from the file path
jsonfilename=$(echo $filepath | sed -e 's/.\/universities\///g')
# remove the .json extension to get the docname
docname=$(echo $jsonfilename | sed -e 's/.json//g')
url="http://username:password@mick.couchone.com/universities/${docname}"
# put the document into CouchDB
echo curl -X PUT "$url" -d @"$filepath"
curl -X PUT "$url" -d @"$filepath"
done

This page helped me to write the bash script.

I saved this bash script into a file called loop. I then ran it by invoking bash loop from the terminal, and watched my documents fly off to heaven, where they live now for everyone to enjoy.

http://mick.couchone.com/universities/_all_docs

You can see the individual documents like this:

http://mick.couchone.com/universities/University%20of%20St%20Andrews

… put the output into http://jsonlint.com/ so you can see the structure of the JSON document properly.

Step 3: Put on a SHOW

Last time we looked at how design documents can hold HTML, and apply the HTML to any document we have.

Our example was this, pretty simple and lame.

http://mick.couchone.com/universities/_design/default/_show/id/University%20of%20Aberdeen

… but, since it’s all “just HTML”, you can do this as well, to use the latitude and longitude on a Google map. (I used the example map code from http://code.google.com/apis/maps/documentation/javascript/tutorial.html.)

So let’s change _show/id in the url to _show/googlemap, so that the output is produced by a different server-side javascript function.

http://mick.couchone.com/universities/_design/default/_show/googlemap/University%20of%20Aberdeen

So, all I’ve done here is add a show to my design document. The name of the show is “googlemap”.

The design document is here.

The design document also contains a view and a list, which enables us to link to all the individual google maps.

http://mick.couchone.com/universities/_design/default/_list/defaultlist/defaultview

Written by klena02

August 16, 2010 at 10:50 am

Posted in CouchDB

CouchDB Design Documents

with one comment

Hello folks, I’m back. Did you miss me? Oh how nice of you to say so.

So when we last chatted I mentioned I was doing my Master’s project about CouchDB.

To get started with CouchDB, go and get a CouchDB instance hosted by the wonderful people at couch.io:
http://www.couch.io/get

Today, I want to talk a little about design documents in CouchDB.

Don’t be scared, stay with me and we’ll have a great time.

A design document in CouchDB is a document which stores application code.
http://mick.couchone.com/universities/_design/default

Let’s see if we can try to understand this design document, and what it is doing.

Look at this url – it’s a little bit crazy.
http://mick.couchone.com/universities/_design/default/_show/id/Aberystwyth%20University

So, what does this crazy url consist of?

http://mick.couchone.com/universities is a database.

To see it in “glorious technicolor”, you can use the built-in CouchDB utility, Futon: http://mick.couchone.com/_utils/database.html?universities

Are you more of a purist? Want to see all the documents in this database in no-frills json format?
http://mick.couchone.com/universities/_all_docs
Note that what you see here is a set of key-value pairs.

You put data in to CouchDB in the form of JSON documents.
Internally, CouchDB stores the data in the form of indexed key-value pairs.

Anyway, sorry for that digression, let’s return to that crazy url.

http://mick.couchone.com/universities is our database.

CouchDB is a document-oriented database – it contains – yes, you guessed it – documents.

Most of the documents contain data, in our case data to do with universities, like this one.
http://mick.couchone.com/universities/Aberystwyth%20University

One of the many amazing things about CouchDB is that documents can contain anything.

So, as well as having documents that contain data, we can also – wait for it – have documents that contain code as well.

As we all know it’s a very very good idea to separate code from data.
So, CouchDB likes us to put code in design documents.

The way I like to think about it is that a design document is like an application, using the data in the database.

Anyway, because I am very unimaginative I have called my application “default”.

So, to create my application, I create a design document, and give it an id according to CouchDB’s special naming convention.

http://mick.couchone.com/universities/_design/default

The database is “universities”, the name of the document is “_design/default” – it’s a little confusing to have a forward slash in the name of the document, but that’s the convention that CouchDB uses for design documents.

Within the design document, there is a section named “shows”. This gives us a way to show the data for a particular document.


"shows":{
"id":"function(doc,req){ return '<h1>' + doc._id + '</h1>' }"
}

Here, I’ve defined one show, called “id”. It’s a function which accepts as parameters a document (doc) and a request object (req).

This is the url to access this show.
http://mick.couchone.com/universities/_design/default/_show/id/Aberystwyth%20University

This one is very simple. You can see how you could use this to build up a web page showing all the data relating to a particular university.

That’s what I intend to do next.

Written by klena02

August 10, 2010 at 9:51 pm

Posted in CouchDB

Project Proposal

leave a comment »

Written by klena02

April 3, 2010 at 7:40 pm

Posted in CouchDB

CouchDB

leave a comment »

A database in CouchDB is simply a collection of documents. election-2005 is a database containing results from all UK constituencies for the 2005 election, taken from www.electoralcalculus.co.uk.

Futon Overview

 

Each document has a unique ID (in this case I’m using the constituency name), and a revision number (shown in green) which is used to provide concurrency control.

 

A constituency record, showing total electorate plus votes for con, lab, lib, Plaid Cymru (pc), and others.

 

The same record in its native JSON format.

 

Here’s how to query the collection: in this case to get all constituencies, ordered by number of Conservative votes. CouchDB requires us to supply a Map function, which ranges of the document collection, and emits a set of key-value pairs. This creates an index on this value in the B+ tree.

 

Just in case you’re interested, the constituency with the highest number of Conservative votes is the Isle of Wight.

 

Now, let’s look at a where clause. Let’s say you want to get all constituencies where number of Conservative votes is between 1000 and 2000. What we want is to retrieve a sub-set from the index we created a minute ago. To do this, we append a query string on the end of the url: startkey=1000&endkey=2000.

 

Finally, here is an aggregate of all Conservative votes (8,782,198). To do this, we use a Reduce function, which sums the values produced by the Map function.

 

Written by klena02

February 1, 2010 at 9:59 am

Posted in CouchDB

Follow

Get every new post delivered to your Inbox.