Community Software 24

Let me help to build your community! Future oriented!

 
    Connect with Facebook

All site blogs

mongoDB - MySQL to NoSQL database in practice

May 6, 2012 by Torsten Wesolek   Comments (0)

, ,

elgg, development

This article describes some concepts and my first experiences with mongoDB. In the last part I give some reasons for moving Elgg from mySQL to mongoDB or another noSQL database.

NoSQL - a new hype? 

The idea of NoSQL databases is not so new. Document-oriented databases like Lotus Notes are not so far away from what we call today a NoSQL (not only SQL) database.

I followed a lot of discussion, if NoSQL databases in general are better than the conventional relational databases. In which cases one should use them and in what circumstances it really needs a relational database.

Some new web projects are based on NoSQL databases, like couchDB, Redis or mongoDB. And mostly project leaders and developers seemed to be much exited.

My practical problem

I own a shop system. No, it's not really a shop system; it's more like a big mySQL table with currently round about 800.000 data rows. Nightly my server downloads a bunch of product data from Affiliate networks and tries to add categories and subcategories. There are some light relations, e.g. between the product table and the category table. Also a fulltext index is added to give users the possibility to search e.g. for all products, having a special substring in fields like productname, description, shortdescription, brand or so.

Although I used some tricks like deactivating all indexes while download and insert, the whole process of downloading and indexing of these data takes some hours.

I saw the limit with this technique: I would run into problems, if I would add some more new partners with a bigger product portfolio.

In production environment my server is very busy to find and deliver selected data, especially if one uses fulltext search.

I decided to give a chance to a NoSQL database. I heard, they are good for handling large amount of mostly data. And that is my use case.

Selecting the right database system

There are some database systems out there, and they are good depending on your tasks and environment.

If you mainly want to access the database with REST, e.g. with you JavaScript RIA application, then CouchCB could be the system of your choice. Have a look at cassadra, if have to handle many writes to your database.

MongoDB is very fast in reading (the main task for most web applications) and has excellent horizontal scalability. Search for mongoDB autosharding, if you want to know more.

If you want to know more about the differences between mongoDB and CouchDB please follow this link.

http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB

I liked the idea, to save and access JSON data - and mongoDB uses the BSON format - that's a more compact JSON format. I looked for the availability of native drivers for my projects language, and PHP is as good supported as many, many others. I had a view on performance and scalability. Main focus is the web application. 

Installation and first contact

MongoDB was easy to setup, it was not hard to find a good management console, written in PHP, it's easy to access the database directly in the mongo console, using JavaScript. I installed mongoDB on two different Linux distributions, and there where no problems. I installed it on my Windows, and I had no problem. With a 32 bit system you only can access 2 GB databases.

Transferring the schema

What does that mean? MongoDB is like all NoSQL databases schemaless.

In SQL databases you have a fixed set of fields in a table.

Comparing to a table in a SQL database, in mongoDB we have a collection. But such a collection may have datasets (called documents) of every structure.

In practice you would save your data of one type in one collection. But often you would embed things like simple lookups.

These days I’ve seen a database table with contact entries. One contact in this table has 115 fields, it’s not a joke. There are phone1, phone2, phone3, private_phone1, private_phone2, private_phone3, adress1, address2… and – may be the designer forgot some fields there are 20 fields for free usage.

In mongoDB you just fill the fields you need. And if you add a new field later, so what? You don’t have to change your schema. All contacts without this new information are not affected.

The second difference is: You often embed a simple lookup into your collection.

I’ll give a simple example:

You have a table of staffs and a second table where the children names are saved. The children table contains the id of the associated employee.

In mongoDB you could create a collection "staffs".

The data of that collection could be look as follows:

{

  _id : ObjectId("4f9d9183eabc88c012000118"),

  name : "Smith",

  surname : "Hajo",

  children : {

     [

 { name : "Paula" },

 { name : "Paul" }

     ]

  }

}

The _id is given by mongoDB and is used as primary key, but you can define your own if you need to.

Can you see, how easy it is, if you get more information about Paula? You simply update the document to this:

{

  _id : ObjectId("4f9d9183eabc88c012000118"),

  name : "Smith",

  surname : "Hajo",

  children : {

     [

 { name : "Paula", birthday : "2011-02-02"},

 { name : "Paul" }

     ]

  }

}

One more thing you can see: you should use short names for the keys to keep the database storage space as small as possible. I would change the structure to:

{

  _id : ObjectId("4f9d9183eabc88c012000118"),

  n : "Smith",

  sn : "Hajo",

  c : {

     [

 { n : "Paula", bd : "2011-02-02"},

 { n : "Paul" }

     ]

  }

}

That saves a lot of disk space.

Notes from praxis

Inserts:

Downloading and inserting of round about 1 Mio of data rows, creating the necessary indexes including fulltext takes round about 3 or 4 hours with mySQL.

Now I insert the data in round about half an hour into my mongoDB. Creating some additional order indexes takes also half an hour.

Queries:

Searching for indexed keys is very fast. But note: you should have enough memory. MongoDB is very fast as long as the database size fits in memory size.
It's very easy to create the queries, you only have to specify key / value pairs for the "WHERE" criteria.

As a result you will get a cursor, which you can iterate. The cursor also may have additional properties like sort.


Are mongoDB or other NoSQL database like Cassandra interesting for elgg?

In large Elgg communities we see some performance problems. One challenge is to give Elgg system managers a more performance stable system independent of the amount of users.

Using of NoSQL databases may be hype, but it would be worth to find out if it's possible the best way for the future for Elgg large communities, the best technique to handle huge databases.

In Elgg you have many relationships between several tables to archive one goal: Saving of more or less schemaless information. Look at the user's profile: If you want to add an additional field, Elgg saves it similar like mongo does. Not in the same structure, but Elgg creates e new key / value pair and saves it in relation to his entity.

I assume Elgg is an ideal candidate for a schemaless NoSQL database, much more suitable for that task than mySQL or any other relational databases. As in Elgg you can archive to add or remove a property without changing the (not existing) database schema.

MongoDB has autosharding build in - that means: Scale horizontally about as much machines as you like without compromising functionality.

Moving Elgg from mySQL to noSQL database would be possible and worth, but I can't give estimation for effort. May be, it's not so complicated, if only the database layer has to be replaced. But I fear that SQL queries are built on several parts inside of Elgg core or plugins.


New Plugin: Westors Elgg Manager

August 28, 2010 by Torsten Wesolek   Comments (0)

elgg, development, user, plugins

During the last months I developed a plugin for Elgg to manage users, friends and groups comfortable by a RIA (Rich Internet Application) interface. The client part of my plugin was developed by a JavaScript Ajax framework, called qooxdoo. Qooxdoo is a really amazing framework, IMHO one of the best and cleanest frameworks available for such tasks. Having the opportunity to learn so much more about qooxdoo in my actual projects and beeing in close touch with the qoodoo core developement team makes me feel good.

This is release 0.9 of my plugin, which already gives some nice options for managing your users. Did you know, how many users in your communiy use the registration form, but never do a click on the validation link in there e-mail? Simply look at "Not activated users" and delete these dead records, if you like.

Download or visit related Page at elgg.org

The plugin is for admins and for normal users; admins of course have options, like delete or edit user data.

The instruction manual comes in form of some screenshots with comments. Please have a view at the following images.

image

image

image

image

image

image

image

 

The last picture shows a planned feature, the SMS functionality of the plugin.

image

Download Counter Plugin

February 19, 2010 by Torsten Wesolek   Comments (0)

,

elgg, development, plugins

I was missing a download counter, like we have one in the elgg community website for every plugin.

This small extension shows a download counter on the right hand side of the file download button. (Default Elgg file plugin) To install, download the zip. Unpack in the mod file and activate it. Take care, it is below the file plugin.

Actual version is 1.6. Download here or on elgg.org

 

deleteMe Plugin

February 16, 2010 by Torsten Wesolek   Comments (0)

elgg, development, plugins

The small deleteMe plugin extends the edit profile page with a button to delete the current user and his content from the system. You can follow the discussion about this here:

http://community.elgg.org/pg/plugins/westor/read/408894/deleteme

You can download it there or directly from this site:

http://community-software-24.com/561

Use at your own risk. All your own content will be removed.

 

Tidypics Flash Upload Tool

February 16, 2010 by Torsten Wesolek   Comments (0)

development, elgg, plugins

Last year I added the fullscreen slide show to the great tidypics plugin. I had a great collaboration with the tidypics team, thank you, cash!

I often was asked for the flash upload tool, which is able to upload multible files in a comfortable manner. This is not part of the official tidypics version up to now, but you may check out my version, I dropped a zip of this here:

http://community-software-24.com/418

Use at your own risk. It works fine with IE and FF but the is a problem in Google Chrome. (At least in my version...)

 

Short URLs for Twitter & Co.

February 16, 2010 by Torsten Wesolek   Comments (0)

The following is a general issue, and you can easy abstract this for other software, eg wordpress. I will write this in relation of elgg, of course.

If webdevelopers have the the task to work with short urls, e.g. if the want to use twitter to link back to their articles, they often decide to use a url shortening service like bit.ly or other. A short url then looks like http://bit.ly/abzxy .

Also google will offer such a service under the address goog.le.

As you will know, the lenght of a twitter message is restricted to 140 characters. A short url is often the only way to link to an article at your website.

But there are a few drawbacks: If the service would be disabled or would go offline for some reason, or if the service provider would take money for every short url redirection, what would you do? May be, all your links could lead to nowhere some day.

But - one of the main problems is the SEO view. One of the main components of Search Engine Optimization is building of backlinks to your site. If you use such a short url service you do not build valuable backlinks. But why?

If the service (e.g. at bit.ly) receives a url call, they will decode this url and redirect your browser to the real address at your site. This works well with your browser, but not with any search engine. Google & Co. will not follow a redirection with a "302 moved permanently" header.

As a result they do not consider these backlinks for calculation of the so importend backlink count to your site.

What can you do?

Assuming, you have a relative short domain name, you can do a something. (community-software-24.com .... hmmmm.... not really short Undecided )
Assuming furthermore you have a webserver with rewrite possibilities (e.g. apache with mod-rewrite enabled) and assuming you have a software system with numerical ID in the database.

1. configure your server

Your server should be configured in the manner to handle the url without the "www". In my cace I should be able to call http://community-software-24.com/

2. build the URL

Find the place in your software, where the URL to your content is build. In Elgg this is the function getUrl() for an entity. Lets say, your url is build in this manner: http://www.community-software-24.com/pg/file/torwe/read/413/tidypics-version-with-flash-upload-tool , your short URL should look like this http://community-software-24.com/413 . You should write a function like getShortUrl() to do this.

3. modify your twitter service plugin

Elgg specific: You have to modify your twitter service plugin to use your shohrt url together with twitter send.

4. redirect script

Drop a script like this in your document root (eg. save as redirect.php):

require_once($_SERVER["DOCUMENT_ROOT"] . '/engine/start.php');

$guid = (int) $_GET["guid"];

$elggobject = get_entity($guid);

if (is_object($elggobject)) {

      $longurl = $elggobject->getUrl();

      if (!strstr($longurl,'http://')) {

                  $longurl = 'http://' . $_SERVER["HTTP_HOST"] . $longurl;

      }

    header("HTTP/1.0 301 Moved Permanently");

    header("Location: $longurl");

    header("Connection: close");

    exit;

}

exit;

 

5. rewrite entry

Create a rewrite entry in the server conf file or in your .htaccess, which maches with such URLs and will rewrite them to your redirect script. As you will notice, the script above will redirect with a search engine friendly 301 header to your entry.

For apache webserver this look like this:

   RewriteRule ^([0-9]+)$ /redirect.php?guid=$1 [L]

Don't forget to restart your server, if you changed your conf file.

That's it.

 

jquery datepicker isssues

January 21, 2010 by Torsten Wesolek   Comments (1)

I had a problem with the datepicker, used e.g. in the wonderful Profile Manager plugin. There you can add new fields. Fields also can be dropdowns and dates. You have to download and install the plugin if you don't have already - it's really a must!

But:

If I picked up a date (in my case the users birthday) and saved the profile, the wrong day was saved.

Datepicker is able to give back the date in several formats. In this case the timestamp is used. But that timestamp is not localised in our version of jquery datepicker, and you have to do it yourself.

So the line 51 in mod/profile_manager/views/default/input/datepicker.php in the datepick function had to be changed to this:

$('#<?php echo $strippedname; ?>_alt').val($('#<?php echo $strippedname; ?>_alt').val()/1000 - new Date().getTimezoneOffset()*60);

As a result of my hint this plugin bug is fixed now.
But this hint could be helpful to some developer who would like to use this datepicker timestamp.

BLOG claiming for Technorati

January 20, 2010 by Torsten Wesolek   Comments (0)

, , , ,

As you will know, Technorati is for BLOGS as important as google for search.
Therefore I wanted to claim my Blog (Elgg System) for technorati.
Claiming your BLOG will help you to increase the number of visitors. You can place your blog in up to 20 categories (type in tags) and you can give a blog description.

For claiming your blog you will get a claim code. This code you have to insert at one of your post. You can delete this Post later or simply remove the code from the post.

See here for a more detailed tutorial .

My problem and the problem of many other users was a message like this:

"... Unfortunately, we have encountered a problem reading your site's data...."

The feed was readable with several feedreaders, but a check with Feedvalidator showed an incomplete permalink. This feed therefore was invalid. (Some small other issues where shown, but the where not breaking the feed.)

Actually I did a small change, and as a result my claim code was found.

file: /views/rss/object/default.php

 

<item>
 <guid isPermaLink='true'><?php
$permalink = $vars['entity']->getURL();
if (!strstr($permalink,'http://')) {
$permalink = 'http://' . $_SERVER["HTTP_HOST"] . $permalink;
}
echo htmlspecialchars($permalink);
 ?></guid>
 <pubDate><?php echo date("r",$vars['entity']->time_created) ?></pubDate>
 <link><?php echo htmlspecialchars($permalink); ?></link>
 <title><![CDATA[<?php echo $title; ?>]]></title>

 

<item>
 <guid isPermaLink='true'><?php
$permalink = $vars['entity']->getURL();
if (!strstr($permalink,'http://')) {
$permalink = 'http://' . $_SERVER["HTTP_HOST"] . $permalink;
}
echo htmlspecialchars($permalink);
 ?></guid>
 <pubDate><?php echo date("r",$vars['entity']->time_created) ?></pubDate>
 <link><?php echo htmlspecialchars($permalink); ?></link>
 <title><![CDATA[<?php echo $title; ?>]]></title>

...      

 

The problem was the relative link. Now, if getURL() ist delivering a relative link, this is extended to a full URL.