Tuesday 7 January 2014

RSS

RSS Tutorial

RSS is a protocol that provides an open method of syndicating and aggregating Web content.

Technically, RSS is a Syndication Standard based on a type of XML file that resides on an internet server.

This tutorial gives a complete understanding on RSS.

What is RSS?

RSS is an open method for delivering regularly changing web content. Many news-related sites, weblogs and other online publishers syndicate their content as an RSS Feed to whoever wants it.

Any time you want to retrieve the latest headlines from your favorite sites, you can access the available RSS feeds via a desktop RSS reader. You can also make an RSS feed for your own site if your content changes frequently.

In brief:

RSS is a protocol that provides an open method of syndicating and aggregating Web content.
RSS is a standard for publishing regular updates to web-based content.
RSS is a Syndication Standard based on a type of XML file that resides on an internet server.
RSS is an XML application, conforms to the W3C's RDF specification and is extensible via XML.
You can also download RSS feeds from other sites to display updated news items on your site, or use a desktop or online reader to access your favorite RSS feeds.

What does RSS stand for? It depends on what version of RSS you are using.

RSS Version 0.9 - Rich Site Summary
RSS Version 1.0 - RDF Site Summary
RSS Versions 2.0, 2.0.1 and 0.9x - Really Simple Syndication

What is RSS Feed?

The RSS feed is a text XML file that resides on an Internet server.
The RSS feed file includes basic information about a site (title, URL, description), plus one or more item entries that include - at a minimum - a title (headline), a URL, and a brief description of the linked content.
There are various flavors of RSS feed depending on RSS Version. Another XML feed format is called ATOM.
RSS Feeds are registered with an RSS registry to make them more available to viewers interested in your content area.
RSS feeds can have links back to your website which will result in a high traffic to your site.
RSS feeds are updated hourly (Associated Press and News Groups), some RSS feeds are updated daily, and others are updated weekly or irregularly.

How Does RSS Work?

This is how RSS works:

A website willing to publish their content using RSS, creates one RSS feed and keeps it on an web server. RSS Feeds can be created manually or with software.
A website visitor will subscribe to read your RSS feed. An RSS feed will be read by a RSS feed reader.
The RSS Feed Reader reads the RSS Feed file and displays it. The RSS Reader displays only new items from the RSS Feed
The RSS Feed Reader can be customized to show you content related to one or more RSS feeds and based on your own interest.

News Aggregators and Feed Readers:

The RSS feed readers and news aggregators are essentially the same thing and are piece of software. Both are used for viewing RSS feeds. News aggregators are designed specifically to view news related feeds but technically they can read any feeds.

Who can use RSS?

RSS started out with the intent of distributing news related headlines. The potential for RSS is significantly larger and can be used anywhere in the world.

Consider using RSS for the following:

New Homes - Realtors can provide updated feeds of new home listings on the market.
Job Openings - Placement firms and newspapers can provide a classifieds feed of job vacancies.
Auction Items - Auction vendors can provide feeds containing items that have been recently added to ebay or other auction sites.
Press Distribution - Listing of new releases.
Schools - Schools can relay homework assignments and quickly announce school cancellations.
News & Announcements - headlines, notices and any list of announcements.
Entertainment - Listings of the latest tv programs or movies at local theatres.

Summary:

RSS is becoming popular every day. The reason is fairly simple. RSS is a free and easy way to promote a site and its content without the need to advertise or create complicated content sharing partnerships.

RSS Advantages

RSS is taking off so quickly because people are liking it. RSS is easy to use and it has advantages for a publisher as well as for a subscriber. Here we have listed out few advantages of RSS for subscribers as well as for publishers.

RSS Advantages for Subscribers:

RSS subscribers are the people who subscribe to read a published feed. Here are some of the advantages of RSS feeds for subscribers.

All News at one Place: You can subscribe to multiple news groups and then you can customize your reader to have all news on a single page. It will save you a lot of time.
News when you want it: Rather than waiting for an e-mail, you go to your RSS reader when you want to read a news. Furthermore, RSS feeds display more quickly than information on Web sites, and you can read them offline if you prefer.
Get only the news you want: RSS feed comes in the form of headlines and a brief description so you can easily scan the headlines and click only those stories that interest you.
Freedom from e-mail overload: You are not going to get any email for any news or blog update. You just go to your reader and you will find updated news or blog automatically whenever there is a change on RSS server.
Easy republishing: You may be both a subscriber and a publisher. For example, you may have a Web site that collects news from various other sites and then republishes it. RSS allows you to easily capture that news and display it on your site.

RSS Advantages for Publishers:

RSS publishers are the people who publish their content through RSS feed. I would suggest you to use RSS if you want to get your message out and easily and if you want people to see what you publish, and you want your news to bring people back to your site.

Here are some of the advantages of RSS if you publish on the Web:

Easier publishing: RSS is really simple publishing. You don't have to maintain a database of subscribers to send your information to them, instead they will access your feed using a reader and will get updated content automatically.
A simpler writing process: If you have a new content on your Web site, you need only write a RSS feed in the form of titles and short descriptions, and link back to your site.
An improved relationship with your subscribers: Because people subscribe from their side, they don't feel as if you are pushing your content on them.
The assurance of reaching your subscribers: RSS is not subject to spam filters, your subscribers get the feeds which they subscribe to and nothing more.
Links back to your site: RSS feeds always include links back to a Web site. This will increase lot of traffic towards your website.
Relevance and timeliness: Your subscribers always have the latest information from your site.

RSS Version History

RSS was first invented by Netscape. They wanted to use an XML format to distribute news, stories and information. Netscape refined the version of RSS and then dropped it.

Later Userland Software started controlling RSS specifications and releasing newer RSS versions. They continued development of their own version of RSS and eventually UserLand released RSS v2.

RSS has been released in many different versions.

12/27/97 - Dave Winer at Userland developed scriptingNews. RSS was born.
3/15/99 - Netscape developed RSS 0.90 (which supported scriptingNews). This was simply XML with an RDF Header and it was used for my.netscape.com
6/15/99 - Dave Winer at UserLand develops scriptingNews 2.0b1 which included Netscape's RSS 0.90 features also
7/10/99 - Netscape developed RSS 0.91. In this version they removed the RDF header, but included most features from scriptingNews 2.0b1.
7/28/99 - UserLand deprecated scriptingNews formats and adopted only RSS 0.91
Netscape stops their RSS development
6/4/00 - UserLand releases the official RSS 0.91 specification
8/14/00 - A group lead by Rael Dornfest at O'Reilly developed RSS 1.0. This format uses RDF and namespaces. This version is often confused as being a new version of 0.91, but this is a completely new format with no ties to RSS 0.91
12/25/00 - Dave Winer at UserLand develops RSS 0.92 which is 0.91 with optional elements.
04/20/01 - RSS0.93 was discussed but never deployed
03/14/02 - MetaWeblog API merged RSS 0.92 with XML-RPC to provide a powerful blogging API.
09/18/02 - Dave Winer developed RSS 2.0 after leaving Userland. This is 0.92 with optional elements. MetaWeblog API updated for RSS 2.0. While in development, this format was called 0.94.
07/15/03 - Official Spec RSS 2.0 was released through Harvard under a Creative Commons license

Which RSS Version Should be Used?

There is no consensus on using RSS feed version. Its up to you which version you want to use. I would personally suggest to use latest one which RSS2.0. This is simple enough to use and easy to learn.

About 50 % of all RSS feeds use RSS 0.91
About 25 % use RSS 1.0
The last 25 % is split between RSS 0.9x versions and RSS 2.0

RSS Feed Formats

RSS has been released in many different versions in last 10 years. Here we will give you detail about three most commonly used RSS version.

RSS v0.91 Feed Format:

RSS v0.91 was originally released by Netscape in 1999.
RSS v0.91 does not have RDF header.
RSS v0.91 is called Rich Site Summary (RSS)
RSS v0.91 has features from Dave Winer's RSS version scriptingNews 2.0b1.
RSS v0.91 has support for international languages and encodings.
RSS v0.91 has support for image height and width definitions.
RSS v0.91 has support for description text for headlines.
Check complete set of - RSS v0.91 tags and syntax

RSS v1.0 Feed Format:

RSS 1.0 is the only version that was developed using the W3C RDF (Resource Description Framework) standard. This version of RSS is called RDF Site Summary.
RSS 0.91 and RSS 2.0 are easier to understand than RSS 1.0. Next chapter will discuss about RSS 0.91, RSS 2.0 and RSS 1.0 formats.
Check complete set of - RSS v1.0 tags and syntax

RSS v2.0/2.01 Feed Format:

RSS 2.0/2.01 is very similar to RSS 0.9x. RSS 2.0/2.01 adds namespace modules and six optional elements to RSS 0.9x.
RSS 2.0/2.01 specification was written by Dave Winer of Radio UserLand. The copyright was later transferred to Harvard University.
Check complete set of - RSS v2.0 tags and syntax

RSS Reading Feeds

Many sites offer RSS feeds, which you can identify by a small yellow button that says either

. However, if you click one of these links, you will most likely get a page full of code in your browser.

To properly read the feed, you need an RSS reader. Here are the steps to get and use RSS Feed readers.

Step 1 - Get an RSS Feed Reader

There are a lot of different RSS readers available. Some work as web services, and some are limited to windows (or Mac, PDA or UNIX). Here are a few which you can try:

NewsGator Online - A free online RSS reader. Includes synchronization with Outlook, viewing TV content with Media Center Edition, and publication of blogs and headlines.
RssReader - A free Windows-based RSS reader. Supports RSS versions 0.9x, 1.0 and 2.0 and Atom 0.1, 0.2 and 0.3.
FeedDemon - A Windows-based RSS reader. Very easy to use and has a very orderly interface. However, this is not freeware!
blogbot - An RSS reader plug-in for Outlook or Internet Explorer. The light-version for Internet Explorer is free.

Step 2 - RSS Reader Installation

All the readers come alongwith installation instructions. So use provided script to install your RSS Reader on your computer.

When you first launch a standalone reader, most often you will see a toolbar and three window panes arranged much like the preview mode in Microsoft Outlook. The pane on the left side typically displays RSS feeds, or channels, to which you are subscribed. These can be organized into categories or folders.

The upper-right panel typically shows a list of articles within whichever channel is selected, and the article content is then displayed in the lower-right panel. To change channel groups, just click the drop-down box at the upper left beneath the menus. Sometimes a brief description will appear in the lower right; if so, click the link in the article to load the complete text.

Some standalone apps can be configured to send you e-mail every time there's a new article on a topic you're interested in.

Step 3 - Add Channels and Channel groups

To add a channel ie. RSS feed, go to the RSS page of any site using yellow button that says either

. Right-click or use CTRL+C to copy the URL from the address bar of your browser, which should show a page full of XML code.

Now go back to your newsreader, choose the category where you want the new subscription to live (Business, Entertainment, the New York Times), and select New or New Channel from the File menu. In most cases, the URL you copied should automatically be pasted into the URL field in the New Channel wizard. If not, you can cut and paste the URL yourself.

Step 4 - Customize RSS Reader:

When you accumulate lots of articles from your various feeds, it can become difficult to find specific information. Fortunately, newsreaders include useful tools for finding articles.

A Filter tool will show only articles that contain a keyword you specify. This may also be labeled Search. To use it, type a keyword directly into the Filter/Search bar.

Some readers include the ability to set a watch, an automatic search through all your incoming feeds for a specific keyword. For example, you could enter ICQ as a watch. If any article in any feed you subscribe to mentions ICQ, the article will be included in the Watch list.

You needs to check help section of your reader to find out more options to customize it according to your needs.

Step 5 - Cleaning unwanted feeds:

Eventually, you'll probably end up with more feeds than you want or can read regularly. In most readers, to delete a feed you're no longer interested in, you simply delete its title. Then your RSS reader won't seek out that information anymore, and you won't get any content from the publisher unless you go to its site or resubscribe to the feed.

RSS Feed Publishing

Now you are aware how to write a RSS Feed for your site. If you don't know how to prepare RSS feed file then please go through RSS Feed Formats.

Uploading RSS Feed:

Here are simple steps to put your RSS Feed on the web.

First decide which version of RSS Feed you are going to use for your site. I would recommen to use latest version available.
Create your RSS Feed in a text file with extension either .xml or .rdf. Upload this file on your web server.
You should validate your your RSS Feed before making it live. Check next chapter on RSS Feed Validation.
Create a link on your Web Pages for the RSS Feed File. You will use a small yellow button for the link that says either or .

That's it, Now your RSS Feed is online and people can start using it. But there are ways to prompote your RSS Feed so that more number of people should use your RSS Feed.

Promote Your RSS Feed:

Submit your RSS feed to the RSS Feed Directories. There are many directories avialable on the web where you can register your Feed. Few are given here:
- Syndic8: Over 300,000 feeds listed.
- Daypop: Over 50,000 feeds listed.
- Newsisfree: Over 18,000 feeds.
Register your feed with the major search engines. Similar to your web pages you can add your feed as well with the following major search engines.
- Yahoo - http://publisher.yahoo.com/promote.php
- Google - http://www.google.com/intl/zh-cn/webmasters/addfeed.html
- MSN - http://rss.msn.com/publisher.armx

Keeping up-to-date Feed:

As we have explained earlier, RSS Feed makes sense for the site who are changing thier content very frequently. Like any news or bloggig sites.

So now you have gotten RSS feed buttons from Google, Yahoo, and MSN. Now you must make sure that you update your content frequently and that your RSS feed is constantly available.

RSS Validation & Validators

If you have created one RSS feed for your news group or web blog or for anyother purpose then it is your responsibility to ensure that your RSS feed file can be parsed by the XML parser of any subscribing site.

Many of RSS feed creation softwares validate XML at the time of feed creation but some don not do. Make a note that small errors can make your feed unreadable by the standard feed readers.

So I would suggest you before publishing your RSS feed make sure you have done all the required validations. You may wish to load your RSS feed file to your internet server and then enter the URL in one of the following validators to check the syntax.

Feed Validator - This validator validates multiple syndication formats: RSS 0.90, 0.91, 0.92, 0.93, 0.94, 1.0, 1.1, and 2.0. It includes validation for common namespaces.
RSS Validator - If you are using RSS 0.91 or RSS0.92 then you can use this validator to validate your RSSfeed.
Experimental Online RSS 1.0 Validator - If you are using RSS 1.0 then you can use this validator.
Redland RSS 1.0 Validator and Viewer - This is not justa validator but also it acts as RSS to HTML converter.

NOTE: If you find that any of the above mentioned links is not available then please send me an email at webmaster@only4programmers.blogspot.in so that I can correct it, Thanks

What is Atom 1.0?

Atom is the name of an XML-based Web content and metadata syndication format, and an application-level protocol for publishing and editing Web resources belonging to periodically updated websites.

Atom is a relatively recent spec and is much more robust and feature-rich than RSS. For instance, where RSS requires descriptive fields such as title and link only in item breakdowns, Atom requires these things for both items and the full feed.

All Atom feeds must be well-formed XML documents, and are identified with theapplication/atom+xml media type.

Structure of an Atom 1.0 Feed:

A Feed consists of some metadata, followed by any number of entries. Here is a basic structure of an Atom 1.0 Feed

<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>...</title>
  <link>...</link>
  <updated>...</updated>
  <author>
    <name>...</name>
  </author>
  <id>...</id>

  <entry>
    <title>...</title>
    <link>...</link>
    <id>...</id>
    <updated>...</updated>
    <summary>...</summary>
  </entry>

</feed>

Atom 1.0 Feed Tags:

An Atom 1.0 Feed Document will be constructed of the following two elements:

<feed> Elements
<entry> Elements

There are some common construct which are required for the above two elements and they are explained in: Common Construct

<feed> Elements:

Feed ID:

This identifies the feed using a universally unique and permanent URI. If you have a long-term, renewable lease on your Internet domain name, then you can feel free to use your website's address.

Syntax:

<id>http://only4programmers.blogspot.com/</id>

Required:

Required

Feed title:

This contains a human readable title for the feed. Often the same as the title of the associated website. This value should not be blank.

Syntax:

<title>Tutorials and Reference Manuals</title>

Required:

Required

Feed updatation Date:

Thsi indicates the last time the feed was modified in a significant way. All timestamps in Atom must conform to RFC 3339.

Syntax:

Required:

Required

Feed Author:

This names one author of the feed. A feed may have multiple author elements. A feed must contain at least one author element unless all of the entry elements contain at least one author element.

An author element can have <name>, <email> and <uri> tags.

Syntax:

<author>
  <name>Shanavas Rahiman</name>
  <email>shanavas@only4programmers.blogspot.com</email>
  <uri>http://only4programmers.blogspot.com/</uri>
</author>

Required:

Optional, but recommended

Feed link

This identifies a related Web page. The type of relation is defined by the rel attribute. A feed is limited to one alternate per type and hreflang. A feed should contain a link back to the feed itself.

Syntax:

Required:

Optional, but recommended

Feed Category:

This specifies a category that the feed belongs to. A feed may have multiple category elements.

Syntax:

<category>category term="sports"</category>

Required:

Optional

Feed Contributor:

This names one contributor to the feed. An feed may have multiple contributor elements.

Syntax:

<contributor>
  <name>Mohtashim</name>
</contributor>

Required:

Optional

Feed generator:

This identifies the software used to generate the feed, for debugging and other purposes. Both the uri and version attributes are optional.

Syntax:

<generator uri="/myblog.php" version="1.0">
  Example Toolkit
</generator>

Required:

Optional

icon Tag:

This identifies a small image which provides iconic visual identification for the feed. Icons should be square.

Syntax:

Required:

Optional

logo Tag:

This identifies a larger image which provides visual identification for the feed. Images should be twice as wide as they are tall.

Syntax:

Required:

Optional

rights Tag:

This conveys information about rights, e.g. copyrights, held in and over the feed.

Syntax:

Required:

Optional

subtitle Tag:

This contains a human-readable description or subtitle for the feed.

Syntax:

<subtitle>A sub title </subtitle>

<entry> Elements:

An Atom Feed may contain one or more entry elements. Here's a list of the required and optional feed elements.

Entry ID:

This Identifies the entry using a universally unique and permanent URI. Suggestions on how to make a good id can be found here. Two entries in a feed can have the same value for id if they represent the same entry at different points in time.

Syntax:

<id>http://example.com/blog/1234</id>

Required:

Required

Entry Title:

This contains a human readable title for the entry. This value should not be blank.

Syntax:

<title>Atom 1.0 Tutorial</title>

Required:

Required

Entry updation date:

This indicates the last time the entry was modified in a significant way. This value need not change after a typo is fixed, only after a substantial modification. Generally, different entries in a feed will have different updated timestamps.

Syntax:

Required:

Required

Entry Author:

This names one author of the entry. An entry may have multiple authors. An entry must contain at least one author element unless there is an author element in the enclosing feed, or there is an author element in the enclosed source element.

Syntax:

<author>
  <name>Mohtashim</name>
</author>

Required:

Optional, but recommended

Entry Content:

This contains or links to the complete content of the entry. Content must be provided if there is no alternate link, and should be provided if there is no summary.

Syntax:

<content>complete story here</content>

Required:

Optional, but recommended

Entry link:

This identifies a related Web page. The type of relation is defined by the rel attribute. An entry is limited to one alternate per type and hreflang. An entry must contain an alternate link if there is no content element.

Syntax:

Required:

Optional, but recommended

Entry summary:

This conveys a short summary, abstract, or excerpt of the entry. Summary should be provided if there either is no content provided for the entry, or that content is not inline.

Syntax:

Required:

Optional, but recommended

Entry Category:

This specifies a category that the entry belongs to. A entry may have multiple category elements.

Syntax:

Required:

Optional

Entry ontributor:

This names one contributor to the entry. An entry may have multiple contributor elements.

Syntax:

<contributor>
  <name>Mohtashim</name>
</contributor>

Required:

Optional

Published Tag:

This contains the time of the initial creation or first availability of the entry.

Syntax:

Required:

Optional

Entry source:

If an entry is copied from one feed into another feed, then the source feed's metadata (all child elements of feed other than the entry elements) should be preserved if the source feed contains any of the child elements author, contributor, rights, or category and those child elements are not present in the source entry.

Syntax:

<source>
  <id>http://moretutorials.org/</id>
  <title>Tutorials and Reference Manuals</title>
  <updated>2007-07-13T18:30:02Z</updated>
  <rights>© 2007 More Tutorials.</rights>
</source>

Required:

Optional

Entry rights:

This conveys information about rights, e.g. copyrights, held in and over the entry.

Syntax:

<rights type="html">
  © 2007 only4programmers.blogspot.com
</rights>

Required:

Optional

Common Constructs:

Content

<content> either contains, or links to, the complete content of the entry.

In the most common case, the type attribute is either text, html, xhtml, in which case the content element is defined identically to other text constructs, which are described here.

Otherwise, if the src attribute is present, it represents the URI of where the content can be found. The type attribute, if present, is the media type of the content.

Otherwise, if the type attribute ends in +xml or /xml, then an xml document of this type is contained inline.

Otherwise, if the type attribute starts with text, then an escaped document of this type is contained inline.

Otherwise, a base64 encoded document of the indicated media type is contained inline.

Link

<link> is patterned after html's link element. It has one required attribute, href, and five optional attributes: rel, type, hreflang, title, and length.

href is the URI of the referenced resource (typically a Web page)

rel contains a single link relationship type. It can be a full URI or one of the following predefined values (default=alternate):

alternate: an alternate representation of the entry or feed, for example a permalink to the html version of the entry, or the front page of the weblog.
enclosure: a related resource which is potentially large in size and might require special handling, for example an audio or video recording.
related: an document related to the entry or feed.
self: the feed itself.
via: the source of the information provided in the entry.

type indicates the media type of the resource.

hreflang indicates the language of the referenced resource.

title human readable information about the link, typically for display purposes.

length the length of the resource, in bytes.

Person

<author> and <contributor> describe a person, corporation, or similar entity. It has one required element, name, and two optional elements: uri, email.

<name> conveys a human-readable name for the person.

<uri> contains a home page for the person.

<email> contains an email address for the person.

Text

<title>, <summary>, <content>, and <rights> contain human-readable text, usually in small quantities. The type attribute determines how this information is encoded (default="text")

If type="text", then this element contains plain text with no entity escaped html.

<title type="text">AT&amp;T bought by SBC!</title>

If type="html", then this element contains entity escaped html.

<title type="html">
  AT&amp;amp;T bought &lt;b&gt;by SBC&lt;/b&gt;!
</title>

If type="xhtml", then this element contains inline xhtml, wrapped in a div element.

<title type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
    AT&amp;T bought <b>by SBC</b>!
  </div>
</title>

Check complete specification for atom 1.0 RFC4287.

Atom 1.0 Example Feed:

Here is the example feed files which shows how to write Feed using Atom 1.0

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Example Feed</title>
  <subtitle>Insert witty or insightful remark here</subtitle>
  <link href="http://example.org/"/>
  <updated>2003-12-13T18:30:02Z</updated>
  <author>
     <name>Mohtashim</name>
     <email>mohtashim@example.com</email>
  </author>
  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

  <entry>
     <title>Tutorial on Atom</title>
     <link href="http://example.org/2003/12/13/atom03"/>
     <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
     <updated>2003-12-13T18:30:02Z</updated>
     <summary>Some text.</summary>
   </entry>

</feed>

Atom1.0 File Extension:

A specific file-extension for an Atom 1.0 document is not required. But .xml is recommended.

RSS Further Extensions

RSS originated in 1999, and has strived to be a simple, easy to understand format, with relatively modest goals. After it became a popular format, developers wanted to extend it using modules defined in namespaces, as specified by the W3C.

RSS 2.0 adds that capability, following a simple rule. A RSS feed may contain elements not described on this page, only if those elements are defined in a namespace.

The elements defined in this tutorial are not themselves members of a namespace, so that RSS 2.0 can remain compatible with previous versions in the following sense -- a version 0.91 or 0.92 file is also a valid 2.0 file. If the elements of RSS 2.0 were in a namespace, this constraint would break, a version 0.9x file would not be a valid 2.0 file.

RSS is by no means a perfect format, but it is very popular and widely supported. Having a settled spec is something RSS has needed for a long time.

However, the RSS spec is, for all practical purposes, frozen at version 2.0.1. But you can anticipate possible 2.0.2 or 2.0.3 versions, etc. only for the purpose of clarifying the specification, not for adding new features to the format.

Subsequent work should happen in modules, using namespaces, and in completely new syndication formats, with new names.

Monday 9 December 2013

Mongo DB

MongoDB Tutorial

MongoDB is an open-source document database, and leading NoSQL database. MongoDB is written in c++.

This tutorial will give you great understanding on MongoDB concepts needed to create and deploy a highly scalable and performance oriented database.

MongoDB Overview

MongoDB is a cross-platform, document oriented database that provides, high performance, high availability, and easy scalability. MongoDB works on concept of collection and document.

Database

Database is a physical container for collections. Each database gets its own set of files on the file system. A single MongoDB server typically has multiple databases.

Collection

Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.

Document

A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data.

Below given table shows the relationship of RDBMS terminology with MongoDB

RDBMS	MongoDB
Database	Database
Table	Collection
Tuple/Row	Document
column	Field
Table Join	Embedded Documents
Primary Key	Primary Key (Default key _id provided by mongodb itself)
Database Server and Client
Mysqld/Oracle	mongod
mysql/sqlplus	mongo

Sample document

Below given example shows the document structure of a blog site which is simply a comma separated key value pair.

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by: 'only for programmers',
   url: 'http://www.only4programmers.blogspot.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100, 
   comments: [	
      {
         user:'user1',
         message: 'My first comment',
         dateCreated: new Date(2011,1,20,2,15),
         like: 0 
      },
      {
         user:'user2',
         message: 'My second comments',
         dateCreated: new Date(2011,1,25,7,45),
         like: 5
      }
   ]
}

_id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you didn't provide then MongoDB provide a unique id for every document. These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next 2 bytes for process id of mongodb server and remaining 3 bytes are simple incremental value.

MongoDB Advantages

Any relational database has a typical schema design that shows number of tables and the relationship between these tables. While in MongoDB there is no concept of relationship

Advantages of MongoDB over RDBMS

Schema less : MongoDB is document database in which one collection holds different different documents. Number of fields, content and size of the document can be differ from one document to another.
Structure of a single object is clear
No complex joins
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL
Tuning
Ease of scale-out: MongoDB is easy to scale
Conversion / mapping of application objects to database objects not needed
Uses internal memory for storing the (windowed) working set, enabling faster access of data

Why should use MongoDB

Document Oriented Storage : Data is stored in the form of JSON style documents
Index on any attribute
Replication & High Availability
Auto-Sharding
Rich Queries
Fast In-Place Updates
Professional Support By MongoDB

Where should use MongoDB?

Big Data
Content Management and Delivery
Mobile and Social Infrastructure
User Data Management
Data Hub

MongoDB Environment

Install MongoDB On Windows

To install the MongoDB on windows, first doownload the latest release of MongoDB fromhttp://www.mongodb.org/downloads Make sure you get correct version of MongoDB depending upon your windows version. To get your windows version open command prompt and execute following command


C:\>wmic os get osarchitecture
OSArchitecture
64-bit
C:\>

32-bit versions of MongoDB only support databases smaller than 2GB and suitable only for testing and evaluation purposes.

Now extract your downloaded file to c:\ drive or any other location. Make sure name of the extracted folder is mongodb-win32-i386-[version] or mongodb-win32-x86_64-[version]. Here [version] is the version of MongoDB download.

Now open command prompt and run the following command

C:\>move mongodb-win64-* mongodb
      1 dir(s) moved.
C:\>

In case you have extracted the mondodb at different location, then go to that path by using command cd FOOLDER/DIR and now run the above given process.

MongoDB requires a data folder to store its files. The default location for the MongoDB data directory is c:\data\db. So you need to create this folder using the Command Prompt. Execute the following command sequence

C:\>md data
C:\md data\db

If you have install the MongoDB at different location, then you need to specify any alternate path for\data\db by setting the path dbpath in mongod.exe. For the same issue following commands

In command prompt navigate to the bin directory present into the mongodb installation folder. Suppose my installation folder is D:\set up\mongodb

 
C:\Users\XYZ>d:
D:\>cd "set up"
D:\set up>cd mongodb
D:\set up\mongodb>cd bin
D:\set up\mongodb\bin>mongod.exe --dbpath "d:\set up\mongodb\data"

This will show waiting for connections message on the console output indicates that the mongod.exe process is running successfully.

Now to run the mongodb you need to open another command prompt and issue the following command

 
D:\set up\mongodb\bin>mongo.exe
MongoDB shell version: 2.4.6
connecting to: test
>db.test.save( { a: 1 } )
>db.test.find()
{ "_id" : ObjectId(5879b0f65a56a454), "a" : 1 }
>

This will show that mongodb is installed and run successfully. Next time when you run mongodb you need to issue only commands

 
D:\set up\mongodb\bin>mongod.exe --dbpath "d:\set up\mongodb\data" 
D:\set up\mongodb\bin>mongo.exe

Install MongoDB on Ubuntu

Run the following command to import the MongoDB public GPG Key:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

Create a /etc/apt/sources.list.d/mongodb.list file using the following command.

echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list

Now issue the following command to update the repository:

sudo apt-get update

Now install the MongoDB by using following command:

apt-get install mongodb-10gen=2.2.3

In the above installation 2.2.3 is currently released mongodb version. Make sure to install latest version always. Now mongodb is installed successfully.

Start MongoDB

sudo service mongodb start

Stop MongoDB

sudo service mongodb stop

Restart MongoDB

sudo service mongodb restart

To use mongodb run the following command

mongo

This will connect you to running mongod instance.

MongoDB Help

To get list of commands type db.help() in mongodb client. This will give you list of commands as follows:

MongoDB Statistics

To get stats about mongodb server type the command db.stats() in mongodb client. This will show the database name, cumber of collection and documents in the database. Output the command is shown below:

MongoDB Data Modelling

Data in MongoDB has a flexible schema.documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.

Some considerations while designing schema in MongoDB

Design your schema according to user requirements.
Combine objects into one document if you will use them together. Otherwise separate them (but make sure there should not be need of joins).
Duplicate the data (but limited) because disk space is cheap as compare to compute time.
Do joins while write, not on read.
Optimize your schema for most frequent use cases.
Do complex aggregation in the schema

Example

Suppose a client needs a database design for his blog website and see the differences between RDBMS and MongoDB schema design. Website has the following requirements.

Every post has the unique title, description and url.
Every post can have one or more tags.
Every post has the name of its publisher and total number of likes.
Every Post have comments given by users along with their name, message, data-time and likes.
On each post there can be zero or more comments.

In RDBMS schema design for above requirements will have minimum three tables.

While in MongoDB schema design will have one collection post and has the following structure:

{
   _id: POST_ID
   title: TITLE_OF_POST, 
   description: POST_DESCRIPTION,
   by: POST_BY,
   url: URL_OF_POST,
   tags: [TAG1, TAG2, TAG3],
   likes: TOTAL_LIKES, 
   comments: [	
      {
         user:'COMMENT_BY',
         message: TEXT,
         dateCreated: DATE_TIME,
         like: LIKES 
      },
      {
         user:'COMMENT_BY',
         message: TEXT,
         dateCreated: DATE_TIME,
         like: LIKES
      }
   ]
}

So while showing the data, in RDBMS you need to join three tables and in mongodb data will be shown from one collection only.

MongoDB Create Database

The use Command

MongoDB use DATABASE_NAME is used to create database. The command will create a new database, if it doesn't exist otherwise it will return the existing database.

SYNTAX:

Basic syntax of use DATABASE statement is as follows:

use DATABASE_NAME

EXAMPLE:

If you want to create a database with name <mydb>, then use DATABASE statement would be as follows:

>use mydb
switched to db mydb

To check your currently selected database use the command db

>db
mydb

If you want to check your databases list, then use the command show dbs.

>show dbs
local     0.78125GB
test      0.23012GB

Your created database (mydb) is not present in list. To display database you need to insert atleast one document into it.

>db.movie.insert({"name":"only4programmers"})
>show dbs
local      0.78125GB
mydb       0.23012GB
test       0.23012GB

In mongodb default database is test. If you didn't create any database then collections will be stored in test database.

MongoDB Drop Database

The dropDatabase() Method

MongoDB db.dropDatabase() command is used to drop a existing database.

SYNTAX:

Basic syntax of dropDatabase() command is as follows:

db.dropDatabase()

This will delete the selected database. If you have not selected any database, then it will delete default 'test' database

EXAMPLE:

First, check the list available databases by using the command show dbs

>show dbs
local      0.78125GB
mydb       0.23012GB
test       0.23012GB
>

If you want to delete new database <mydb>, then dropDatabase() command would be as follows:

>use mydb
switched to db mydb
>db.dropDatabase()
>{ "dropped" : "mydb", "ok" : 1 }
>

Now check list of databases

>show dbs
local      0.78125GB
test       0.23012GB
>

MongoDB Create Collection

The createCollection() Method

MongoDB db.createCollection(name, options) is used to create collection.

SYNTAX:

Basic syntax of createCollection() command is as follows

db.createCollection(name, options)

In the command, name is name of collection to be created. Options is a document and used to specify configuration of collection

Parameter	Type	Description
Name	String	Name of the collection to be created
Options	Document	(Optional) Specify options about memory size and indexing

Options parameter is optional, so you need to specify only name of the collection. Following is the list of options you can use:

Field	Type	Description
capped	Boolean	(Optional) If true, enables a capped collection. Capped collection is a collection fixed size collecction that automatically overwrites its oldest entries when it reaches its maximum size. If you specify true, you need to specify size parameter also.
autoIndexID	Boolean	(Optional) If true, automatically create index on _id field.s Default value is false.
size	number	(Optional) Specifies a maximum size in bytes for a capped collection. If If capped is true, then you need to specify this field also.
max	number	(Optional) Specifies the maximum number of documents allowed in the capped collection.

While inserting the document, MongoDB first checks size field of capped collection, then it checks max field.

EXAMPLES:

Basic syntax of createCollection() method without options is as follows

>use test
switched to db test
>db.createCollection("mycollection")
{ "ok" : 1 }
>

You can check the created collection by using the command show collections

>show collections
mycollection
system.indexes

Following example shows the syntax of createCollection() method with few important options:

>db.createCollection("mycol", { capped : true, autoIndexID : true, size : 6142800, max : 10000 } )
{ "ok" : 1 }
>

In mongodb you don't need to create collection. MongoDB creates collection automatically, when you insert some document.

>db.only4programmers.insert({"name" : "only4programmers"})
>show collections
mycol
mycollection
system.indexes
only4programmers
>

MongoDB Drop Collection

The drop() Method

MongoDB's db.collection.drop() is used to drop a collection from the database.

SYNTAX:

Basic syntax of drop() command is as follows

db.COLLECTION_NAME.drop()

EXAMPLE:

First, check the available collections into your database mydb

>use mydb
switched to db mydb
>show collections
mycol
mycollection
system.indexes
only4programmers
>

Now drop the collection with the name mycollection

>db.mycollection.drop()
true
>

Again check the list of collections into database

>show collections
mycol
system.indexes
only4programmers
>

drop() method will return true, if the selected collection is dropped successfully otherwise it will return false.

MongoDB Datatypes

MongoDB supports many datatypes whose list is given below:

String : This is most commonly used datatype to store the data. String in mongodb must be UTF-8 valid.
Integer : This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server.
Boolean : This type is used to store a boolean (true/ false) value.
Double : This type is used to store floating point values.
Min/ Max keys : This type is used to compare a value against the lowest and highest BSON elements.
Arrays : This type is used to store arrays or list or multiple values into one key.
Timestamp : ctimestamp. This can be handy for recording when a document has been modified or added.
Object : This datatype is used for embedded documents.
Null : This type is used to store a Null value.
Symbol : This datatype is used identically to a string however, it's generally reserved for languages that use a specific symbol type.
Date : This datatype is used to store the current date or time in UNIX time format. You can specify your own date time by creating object of Date and passing day, month, year into it.
Object ID : This datatype is used to store the document’s ID.
Binary data : This datatype is used to store binay data.
Code : This datatype is used to store javascript code into document.
Regular expression : This datatype is used to store regular expression

MongoDB - Insert Document

The insert() Method

To insert data into MongoDB collection, you need to use MongoDB's insert() or save()method.

SYNTAX

Basic syntax of insert() command is as follows:

>db.COLLECTION_NAME.insert(document)

EXAMPLE

>db.mycol.insert({
   _id: ObjectId(7df78ad8902c),
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by: 'only for programmers',
   url: 'http://www.only4programmers.blogspot.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
})

Here mycol is our collection name, as created in previous tutorial. If the collection doesn't exist in the database, then MongoDB will create this collection and then insert document into it.

In the inserted document if we don't specify the _id parameter, then MongoDB assigns an unique ObjectId for this document.

_id is 12 bytes hexadecimal number unique for every document in a collection. 12 bytes are divided as follows:

_id: ObjectId(4 bytes timestamp, 3 bytes machine id, 2 bytes process id, 3 bytes incrementer)

To insert multiple documents in single query, you can pass an array of documents in insert() command.

EXAMPLE

>db.post.insert([
{
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by: 'only for programmers',
   url: 'http://www.only4programmers.blogspot.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   title: 'NoSQL Database', 
   description: 'NoSQL database doesn't have tables',
   by: 'only for programmers',
   url: 'http://www.only4programmers.blogspot.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 20, 
   comments: [	
      {
         user:'user1',
         message: 'My first comment',
         dateCreated: new Date(2013,11,10,2,35),
         like: 0 
      }
   ]
}
])

To insert the document you can use db.post.save(document) also. If you don't specify _id in the document then save() method will work same as insert() method. If you specify _id then it will replace whole data of document containing _id as specified in save() method.

MongoDB - Query Document

The find() Method

To query data from MongoDB collection, you need to use MongoDB's find() method.

SYNTAX

Basic syntax of find() method is as follows

>db.COLLECTION_NAME.find()

find() method will display all the documents in a non structured way.

The pretty() Method

To display the results in a formatted way, you can use pretty() method.

SYNTAX:

>db.mycol.find().pretty()

Example

>db.mycol.find().pretty()
{
   "_id": ObjectId(7df78ad8902c),
   "title": "MongoDB Overview", 
   "description": "MongoDB is no sql database",
   "by": "only for programmers",
   "url": "http://www.only4porgrammers.blogspot.com",
   "tags": ["mongodb", "database", "NoSQL"],
   "likes": "100"
}
>

Apart from find() method there is findOne() method, that reruns only one document.

RDBMS Where Clause Equivalents in MongoDB

To query the document on the basis of some condition, you can use following operations

Operation	Syntax	Example	RDBMS Equivalent
Equality	{<key>:<value>}	db.mycol.find({"by":"only for programmers"}).pretty()	where by = 'only for programmers'
Less Than	{<key>:{$lt:<value>}}	db.mycol.find({"likes":{$lt:50}}).pretty()	where likes < 50
Less Than Equals	{<key>:{$lte:<value>}}	db.mycol.find({"likes":{$lte:50}}).pretty()	where likes <= 50
Greater Than	{<key>:{$gt:<value>}}	db.mycol.find({"likes":{$gt:50}}).pretty()	where likes > 50
Greater Than Equals	{<key>:{$gte:<value>}}	db.mycol.find({"likes":{$gte:50}}).pretty()	where likes >= 50
Not Equals	{<key>:{$ne:<value>}}	db.mycol.find({"likes":{$ne:50}}).pretty()	where likes != 50

AND in MongoDB

SYNTAX:

In the find() method if you pass multiple keys by separating them by ',' then MongoDB treats it ANDcondition. Basic syntax of AND is shown below:

>db.mycol.find({key1:value1, key2:value2}).pretty()

EXAMPLE

Below given example will show all the tutorials written by 'only for programmers' and whose title is 'MongoDB Overview'

>db.mycol.find({"by":"only for programmers","title": "MongoDB Overview"}).pretty()
{
   "_id": ObjectId(7df78ad8902c),
   "title": "MongoDB Overview", 
   "description": "MongoDB is no sql database",
   "by": "only for programmers",
   "url": "http://www.only4programmers.blogspot.com",
   "tags": ["mongodb", "database", "NoSQL"],
   "likes": "100"
}
>

For the above given example equivalent where clause will be ' where by='only for programmers' AND title='MongoDB Overview' '. You can pass any number of key, value pairs in find clause.

OR in MongoDB

SYNTAX:

To query documents based on the OR condition, you need to use $or keyword. Basic syntax of OR is shown below:

>db.mycol.find(
   {
      $or: [
	     {key1: value1}, {key2:value2}
      ]
   }
).pretty()

EXAMPLE

Below given example will show all the tutorials written by 'only for programmers' or whose title is 'MongoDB Overview'

>db.mycol.find({$or:[{"by":"only for programmers"},{"title": "MongoDB Overview"}]}).pretty()
{
   "_id": ObjectId(7df78ad8902c),
   "title": "MongoDB Overview", 
   "description": "MongoDB is no sql database",
   "by": "only for programmers",
   "url": "http://www.only4programmers.blogspot.com",
   "tags": ["mongodb", "database", "NoSQL"],
   "likes": "100"
}
>

Using AND and OR together

EXAMPLE

Below given example will show the documents that have likes greater than 100 and whose title is either 'MongoDB Overview' or by is 'only for programmers'. Equivalent sql where clause is 'where likes>10 AND (by = 'only for programmers' OR title = 'MongoDB Overview')'

>db.mycol.find("likes": {$gt:10}, $or: [{"by": "only for programmers"}, {"title": "MongoDB Overview"}] }).pretty()
{
   "_id": ObjectId(7df78ad8902c),
   "title": "MongoDB Overview", 
   "description": "MongoDB is no sql database",
   "by": "only for programmers",
   "url": "http://www.only4programmers.blogspot.com",
   "tags": ["mongodb", "database", "NoSQL"],
   "likes": "100"
}
>

MongoDB Update Document

MongoDB's update() and save() methods are used to update document into a collection. The update() method update values in the existing document while the save() method replaces the existing document with the document passed in save() method.

MongoDB Update() method

The update() method updates values in the existing document.

SYNTAX:

Basic syntax of update() method is as follows

>db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA, UPDATED_DATA)

EXAMPLE

Consider the mycol collectioin has following data.

{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}

Following example will set the new title 'New MongoDB Tutorial' of the documents whose title is 'MongoDB Overview'

>db.mycol.update({'title':'MongoDB Overview'},{$set:{'title':'New MongoDB Tutorial'}})
>db.mycol.find()
{ "_id" : ObjectId(5983548781331adf45ec5), "title":"New MongoDB Tutorial"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}
>

By default mongodb will update only single document, to update multiple you need to set a paramter 'multi' to true.

>db.mycol.update({'title':'MongoDB Overview'},{$set:{'title':'New MongoDB Tutorial'}},{multi:true})

MongoDB Save() Method

The save() method replaces the existing document with the new document passed in save() method

SYNTAX

Basic syntax of mongodb save() method is shown below:

>db.COLLECTION_NAME.save({_id:ObjectId(),NEW_DATA})

EXAMPLE

Following example will replace the document with the _id '5983548781331adf45ec7'

>db.mycol.save(
   {
      "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers New Topic", "by":"only for programmers"
   }
)
>db.mycol.find()
{ "_id" : ObjectId(5983548781331adf45ec5), "title":"only for programmers  New Topic", "by":"only for programmers"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}
>

MongoDB Delete Document

The remove() Method

MongoDB's remove() method is used to remove document from the collection. remove() method accepts two parameters. One is deletion criteria and second is justOne flag

deletion criteria : (Optional) deletion criteria according to documents will be removed.
justOne : (Optional) if set to true or 1, then remove only one document.

SYNTAX:

Basic syntax of remove() method is as follows

>db.COLLECTION_NAME.remove(DELLETION_CRITTERIA)

EXAMPLE

Consider the mycol collectioin has following data.

{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}

Following example will remove all the documents whose title is 'MongoDB Overview'

>db.mycol.remove({'title':'MongoDB Overview'})
>db.mycol.find()
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}
>

Remove only one

If there are multiple records and you want to delete only first record, then set justOne parameter inremove() method

>db.COLLECTION_NAME.remove(DELETION_CRITERIA,1)

Remove All documents

If you don't specify deletion criteria, then mongodb will delete whole documents from the collection.This is equivalent of SQL's truncate command.

>db.mycol.remove()
>db.mycol.find()
>

MongoDB Projection

In mongodb projection meaning is selecting only necessary data rather than selecting whole of the data of a document. If a document has 5 fields and you need to show only 3, then select only 3 fields from them.

The find() Method

MongoDB's find() method, explained in MongoDB Query Document accepts second optional parameter that is list of fields that you want to retrieve. In MongoDB when you execute find() method, then it displays all fields of a document. To limit this you need to set list of fields with value 1 or 0. 1 is used to show the filed while 0 is used to hide the field.

SYNTAX:

Basic syntax of find() method with projection is as follows

>db.COLLECTION_NAME.find({},{KEY:1})

EXAMPLE

Consider the collection myycol has the following data

{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}

Following example will display the title of the document while quering the document.

>db.mycol.find({},{"title":1,_id:0})
{"title":"MongoDB Overview"}
{"title":"NoSQL Overview"}
{"title":"only for programmers Overview"}
>

Please note _id field is always displayed while executing find() method, if you don't want this field, then you need to set it as 0

MongoDB Limit Records

The Limit() Method

To limit the records in MongoDB, you need to use limit() method. limit() method accepts one number type argument, which is number of documents that you want to displayed.

SYNTAX:

Basic syntax of limit() method is as follows

>db.COLLECTION_NAME.find().limit(NUMBER)

EXAMPLE

Consider the collection myycol has the following data

{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}

Following example will display only 2 documents while quering the document.

>db.mycol.find({},{"title":1,_id:0}).limit(2)
{"title":"MongoDB Overview"}
{"title":"NoSQL Overview"}
>

If you don't specify number argument in limit() method then it will display all documents from the collection.

MongoDB Skip() Method

Apart from limit() method there is one more method skip() which also accepts number type argument and used to skip number of documents.

SYNTAX:

Basic syntax of skip() method is as follows

>db.COLLECTION_NAME.find().limit(NUMBER).skip(NUMBER)

EXAMPLE:

Following example will only display only second document.

>db.mycol.find({},{"title":1,_id:0}).limit(1).skip(1)
{"title":"NoSQL Overview"}
>

Please note default value in skip() method is 0

MongoDB Sort Documents

The sort() Method

To sort documents in MongoDB, you need to use sort() method. sort() method accepts a document containing list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.

SYNTAX:

Basic syntax of sort() method is as follows

>db.COLLECTION_NAME.find().sort({KEY:1})

EXAMPLE

Consider the collection myycol has the following data

{ "_id" : ObjectId(5983548781331adf45ec5), "title":"MongoDB Overview"}
{ "_id" : ObjectId(5983548781331adf45ec6), "title":"NoSQL Overview"}
{ "_id" : ObjectId(5983548781331adf45ec7), "title":"only for programmers Overview"}

Following example will display the documents sorted by title in descending order.

>db.mycol.find({},{"title":1,_id:0}).sort({"title":-1})
{"title":"only for programmers Overview"}
{"title":"NoSQL Overview"}
{"title":"MongoDB Overview"}
>

Please note if you don't specify the sorting preference, then sort() method will display documents in ascending order.

MongoDB Indexing

Indexes support the efficient resolution of queries. Without indexes, MongoDB must scan every document of a collection to select those documents that match the query statement. This scan is highly inefficient and require the mongod to process a large volume of data.

Indexes are special data structures, that store a small portion of the data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field as specified in index.

The ensureIndex() Method

To create an index you need to use ensureIndex() method of mongodb.

SYNTAX:

Basic syntax of ensureIndex() method is as follows()

>db.COLLECTION_NAME.ensureIndex({KEY:1})

Here key is the name of filed on which you want to create index and 1 is for ascending order. To create index in descending order you need to use -1.

EXAMPLE

>db.mycol.ensureIndex({"title":1})
>

In ensureIndex() method you can pass multiple fields, to create index on multiple fields.

>db.mycol.ensureIndex({"title":1,"description":-1})
>

ensureIndex() method also accepts list of options (which are optional), whose list is given below:

Parameter	Type	Description
background	Boolean	Builds the index in the background so that building an index does not block other database activities. Specify true to build in the background. The default value is false.
unique	Boolean	Creates a unique index so that the collection will not accept insertion of documents where the index key or keys match an existing value in the index. Specify true to create a unique index. The default value is false.
name	string	The name of the index. If unspecified, MongoDB generates an index name by concatenating the names of the indexed fields and the sort order.
dropDups	Boolean	Creates a unique index on a field that may have duplicates. MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key. Specify true to create unique index. The default value is false.
sparse	Boolean	If true, the index only references documents with the specified field. These indexes use less space but behave differently in some situations (particularly sorts). The default value is false.
expireAfterSeconds	integer	Specifies a value, in seconds, as a TTL to control how long MongoDB retains documents in this collection.
v	index version	The index version number. The default index version depends on the version of mongod running when creating the index.
weights	document	The weight is a number ranging from 1 to 99,999 and denotes the significance of the field relative to the other indexed fields in terms of the score.
default_language	string	For a text index, the language that determines the list of stop words and the rules for the stemmer and tokenizer. The default value isenglish.
language_override	string	For a text index, specify the name of the field in the document that contains, the language to override the default language. The default value is language.

MongoDB Aggregation

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. In sql count(*) and with group by is an equivalent of mongodb aggregation.

The aggregate() Method

For the aggregation in mongodb you should use aggregate() method.

SYNTAX:

Basic syntax of aggregate() method is as follows

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

EXAMPLE:

In the collection you have the following data:

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'only for programmers',
   url: 'http://www.only4programmers.blogspot.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'only for programmers',
   url: 'http://www.only4programmers.blogspot.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

Now from the above collection if you want to display a list that how many tutorials are written by each user then you will use aggregate() method as shown below:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
   "result" : [
      {
         "_id" : "only for programmers",
         "num_tutorial" : 2
      },
      {
         "_id" : "only for programmers",
         "num_tutorial" : 1
      }
   ],
   "ok" : 1
}
>

Sql equivalent query for the above use case will be select by_user, count(*) from mycol group by by_user

In the above example we have grouped documents by field by_user and on each occurance of by_user previous value of sum is incremented. There is a list available aggregation expressions.

Expression	Description	Example
$sum	Sums up the defined value from all documents in the collection.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg	Calculates the average of all given values from all documents in the collection.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min	Gets the minimum of the corresponding values from all documents in the collection.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max	Gets the maximum of the corresponding values from all documents in the collection.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push	Inserts the value to an array in the resulting document.	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet	Inserts the value to an array in the resulting document but does not create duplicates.	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first	Gets the first document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.	db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last	Gets the last document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.	db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

Pipeline Concept

In UNIX command shell pipeline means the possibility to execute an operation on some input and use the output as the input for the next command and so on. MongoDB also support same concept in aggregation framework. There is a set of possible stages and each of those is taken a set of documents as an input and is producing a resulting set of documents (or the final resulting JSON document at the end of the pipeline). This can then in turn again be used for the next stage an so on.

Possible stages in aggregation framework are following:

$project: Used to select some specific fields from a collection.
$match: This is a filtering operation and thus this can reduce the amount of documents that are given as input to the next stage.
$group: This does the actual aggregation as discussed above.
$sort: Sorts the documents.
$skip: With this it is possible to skip forward in the list of documents for a given amount of documents.
$limit: This limits the amount of documents to look at by the given number starting from the current position.s
$unwind: This is used to unwind document that are using arrays. when using an array the data is kind of pre-joinded and this operation will be undone with this to have individual documents again. Thus with this stage we will increase the amount of documents for the next stage.

MongoDB Replication

Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Why Replication?

To keep your data safe
High (24*7) availability of data
Disaster Recovery
No downtime for maintenance (like backups, index rebuilds, compaction)
Read scaling (extra copies to read from)
Replica set is transparent to the application

How replication works in MongoDB

MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set. In a replica one node is primary node that receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.

Replica set is a group of two or more nodes (generally minimum 3 nodes are required).
In a replica set one node is primary node and remaining nodes are secondary.
All data replicates from primary to secondary node.
At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.
After the recovery of failed node, it again join the replica set and works as a secondary node.

A typical diagram of mongodb replication is shown in which client application always interact with primary node and primary node then replicate the data to the secondary nodes.

Replica set features

A cluster of N nodess
Anyone node can be primary
All write operations goes to primary
Automatic failover
Automatic Recovery
Consensus election of primary

Set up a replica set

In this tutorial we will convert standalone mongod instance to a replica set. To convert to replica set follow the below given steps:

Shutdown already running mongodb server.

Now start the mongodb server by specifying --replSet option. Basic syntax of --replSet is given below:

mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

EXAMPLE

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0

It will start a mongod instance with the name rs0, on port 27017. Now start the command prompt and connect to this mongod instance. In mongo client issue the command rs.initiate() to initiate a new replica set. To check the replica set configuration issue the command rs.conf(). To check the status of replica sete issue the command rs.status().

Add members to replica set

To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

SYNTTAX:

Basic syntax of rs.add() command is as follows:

>rs.add(HOST_NAME:PORT)

EXAMPLE

Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set issue the command rs.add() in mongo client.

>rs.add("mongod1.net:27017")
>

You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not issue the command db.isMaster() in mongo client.

MongoDB Sharding

Sharding

Sharding is the process of storing data records across multiple machines and it is MongoDB's approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

Why Sharding?

In replication all writes go to master node
Latency sensitive queries still go to master
Single replica set has limitation of 12 nodes
Memory can't be large enough when active dataset is big
Local Disk is not big enough
Vertical scaling is too expensive

Sharding in MongoDB

Below given diagram shows the sharding in MongoDB using sharded cluster.

In the above given diagram there are three main components which are described below:

Shards: Shards are used to store data. They provide high availability and data consistency. In production environment each shard is a separate replica set.
Config Servers: Config servers store the cluster's metadata. This data contains a mapping of the cluster's data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment sharded clusters have exactly 3 config servers.
Query Routers: Query Routers are basically mongos instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally a sharded cluster have many query routers.

MongoDB Create Backup

Dump MongoDB Data

To create backup of database in mongodb you should use mongodump command. This command will dump all data of your server into dump directory. There are many options available by which you can limit the amount of data or create backup of your remote server.

SYNTAX:

Basic syntax of mongodump command is as follows

>mongodump

EXAMPLE

Start your mongod server. Assuming that your mongod server is running on localhost and port 27017. Now open a command prompt and go to bin directory of your mongodb instance and type the command mongodump

Consider the mycol collectioin has following data.

>mongodump

The command will connect to the server running at 127.0.0.1 and port 27017 and back all data of the server to directory /bin/dump/. Output of the command is shown below:

There are a list of available options that can be used with the mongodump command.

This command will backup only specified database at specified path

Syntax	Description	Example
mongodump --host HOST_NAME --port PORT_NUMBER	This commmand will backup all databases of specified mongod instance.	mongodump --host only4programmers.blogspot.com --port 27017
mongodump --dbpath DB_PATH --out BACKUP_DIRECTORY		mongodump --dbpath /data/db/ --out /data/backup/
mongodump --collection COLLECTION --db DB_NAME	This command will backup only specified collection of specified database.	mongodump --collection mycol --db test

Restore data

To restore backup data mongodb's mongorerstore command is used. This command restore all of the data from the back up directory.

SYNTAX

Basic syntax of mongorestore command is

>mongorestore

Output of the command is shown below:

MongoDB Deployment

When you are preparing a MongoDB deployment, you should try to understand how your application is going to hold up in production. It’s a good idea to develop a consistent, repeatable approach to managing your deployment environment so that you can minimize any surprises once you’re in production.

The best approach incorporates prototyping your set up, conducting load testing, monitoring key metrics, and using that information to scale your set up. The key part of the approach is to proactively monitor your entire system - this will help you understand how your production system will hold up before deploying, and determine where you will need to add capacity. Having insight into potential spikes in your memory usage, for example, could help put out a write-lock fire before it starts.

To monitor your deployment MongoDB provides some commands that are shown below:

mongostat

This command checks the status of all running mongod instances and return counters of database operations. These counters include inserts, queries, updates, deletes, and cursors. Command also shows when you’re hitting page faults, and showcase your lock percentage. This means that you're running low on memory, hitting write capacity or have some performance issue.

To run the command start your mongod instance. In another command prompt go to bin directory of your mongodb installation and type mongostat.

D:\set up\mongodb\bin>mongostat

Output of the command is shown below:

mongotop

This command track and report the read and write activity of MongoDB instance on a collection basis. By default mongotop returns information in each second, by you can change it accordingly. You should check that this read and write activity matches your application intention, and you’re not firing too many writes to the database at a time, reading too frequently from disk, or are exceeding your working set size.

To run the command start your mongod instance. In another command prompt go to bin directory of your mongodb installation and type mongotop.

D:\set up\mongodb\bin>mongotop

Output of the command is shown below:

To change mongotop command to return information less frequently specify a specific number after the mongotop command.

D:\set up\mongodb\bin>mongotop 30

The above example will return values every 30 seconds.

Apart from the mongodb tools, 10gen provides a free, hosted monitoring service MongoDB Management Service (MMS), that provides a dashboard and gives you a view of the metrics from your entire cluster.

MongoDB Java

Installation

Before we start using MongoDB in our Java programs, we need to make sure that we have MongoDB JDBC Driver and Java set up on the machine. You can check Java tutorial for Java installation on your machine. Now, let us check how to set up MongoDB JDBC driver.

You need to download the jar from the path Download mongo.jar. Make sure to download latest release of it.
You need to include the mongo.jar into your classpath.

Connect to database

To connect database, you need to specify database name, if database doesn't exist then mongodb creates it automatically.

Code snippets to connect to database would be as follows:

import com.mongodb.MongoClient;
import com.mongodb.MongoException;
import com.mongodb.WriteConcern;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
import com.mongodb.ServerAddress;
import java.util.Arrays;

public class MongoDBJDBC{
   public static void main( String args[] ){
      try{   
		 // To connect to mongodb server
         MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
         // Now connect to your databases
         DB db = mongoClient.getDB( "test" );
		 System.out.println("Connect to database successfully");
         boolean auth = db.authenticate(myUserName, myPassword);
		 System.out.println("Authentication: "+auth);
      }catch(Exception e){
	     System.err.println( e.getClass().getName() + ": " + e.getMessage() );
	  }
   }
}

Now, let's compile and run above program to create our database test. You can change your path as per your requirement. We are assuming current version of JDBC driver mongo-2.10.1.jar is available in the current path

$javac MongoDBJDBC.java
$java -classpath ".:mongo-2.10.1.jar" MongoDBJDBC
Connect to database successfully
Authentication: true

If you are going to use Windows machine, then you can compile and run your code as follows:

$javac MongoDBJDBC.java
$java -classpath ".;mongo-2.10.1.jar" MongoDBJDBC
Connect to database successfully
Authentication: true

Value of auth will be true, if the user name and password are valid for the selected database.