[Pidgin Plugin] Yahoo Messenger - Buzz blocker

Just a small piece of code to block annoying BUZZ messages from your *Yahoo* buddies.

use Purple;

%PLUGIN_INFO = (perl_api_version => 2,
	        name             => "Buzz blocker",
	        version          => "1.0",
	        summary          => "Block all BUZZ messages",
	        description      => "Block all BUZZ messages",
	        author           => "Huy Phan ",
	        url              => "",
	        load             => "plugin_load",
	        unload           => "plugin_unload");

sub conv_received_msg {
    my ($account, $sender, $message, $conv, $flags, $data) = @_;
    if ($message =~ /$sender has buzzed you!/)
    {
        return 1;
    }
    return 0;
}

sub plugin_init {
    return %PLUGIN_INFO;
}

sub plugin_load {
    my $plugin = shift;

    $conv = Purple::Conversations::get_handle();
    Purple::Signal::connect($conv, "receiving-im-msg", $plugin,
                            \&conv_received_msg, "received im message");
    Purple::Debug::info("plugin_load()", "loading Buzz Blocker");
}

sub plugin_unload {
    my $plugin = shift;
    Purple::Debug::info("plugin_unload()", "unloading Buzz Blocker");
}

Save the code above to a file with extension .pl under ~/.purple/plugins/ and restart your pidgin.

Posted: September 9th, 2010
at 4:34pm by admin

Tagged with , , , , ,


Categories: Programming, linux

Comments: No comments


HDFS over Webdav for Hadoop 0.20.1

HDFS over Webdav is not the best choice to mount HDFS and use them out of the original context, but it seems to be a quick installation for serving static data on HDFS through HTTP protocol.
The current version of HDFS-Webdav only supports Hadoop 0.18.3 and earlier versions.
Since Hadoop 0.20.1 became the latest stable version, I had made a fix for HDFS-Webdav.

The code can be found at: http://github.com/huyphan/HDFS-over-Webdav
Feel free to pull out my code and watch it for any updates on my repository.

My repository is also available on the author’s homepage at http://www.hadoop.iponweb.net/Home/hdfs-over-webdav, you definitely want to visit here for the full instruction and known issues when installing HDFS-webdav.

Have fun with Hadoop :)

Posted: October 20th, 2009
at 5:52pm by admin

Tagged with , , ,


Categories: Hadoop, Programming, distributed computing, linux

Comments: No comments


[Hadoop] Mount HDFS using built-in fuse library

There are many ways to mount HDFS as standard file system, the guide and reviews can be found at MountableHDFS wiki page.
I started with WEBDAV and DAVFS and it was working quite good until my data exploded to 1Gb, the performance was getting slow and it took me more than 5 minutes for a `ls` command from root directory.
Fuse-dfs is my second try and it seems to be the best choice since it’s provided along with Hadoop package.
Compiling Fuse-dfs is impossible on Hadoop 0.18.3 because of conflicts between libhdfs and fuse library, so I tried with the latest stable version of Hadoop (0.20.1). If the guide from MountableHDFS Wiki page doesn’t work with you (which happens in most cases), these steps should be helpful:

1. Download and extract Hadoop 0.20.1:

$ wget http://www.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
$ tar xvzf hadoop-0.20.1.tar.gz

2. Download and extract Apache Ant 1.7.1 if you have an older version:

$ wget http://www.apache.org/dist/ant/binaries/apache-ant-1.7.1-bin.tar.gz
$ tar xvzf apache-ant-1.7.1-bin.tar.gz

Don’t forget to add the bin directory of Apache Ant that you’ve just downloaded to your PATH variable.

3. Set your JAVA_HOME variable if you hadn’t :

$ set JAVA_HOME = /usr/java/jdk1.6.0_16

4. Switch to hadoop directory and start compiling libhdfs :

[hadoop@localhost hadoop-0.20.1] $ ant compile-c++-libhdfs -Dislibhdfs=1
[hadoop@localhost hadoop-0.20.1] $ mkdir build/libhdfs
[hadoop@localhost hadoop-0.20.1] $ cp -r /tmp/hadoop-0.20.1/build/c++/Linux-i386-32/lib/* build/libhdfs/
[hadoop@localhost hadoop-0.20.1] $ ant compile-contrib -Dislibhdfs=1 -Dfusedfs=1 -Dlibhdfs-fuse=1

Now you had successful compiled fuse-dfs, to mount HDFS using this library and understand current known issues, please refer to Mountable Wiki page.

Posted: October 7th, 2009
at 10:07am by admin

Tagged with , ,


Categories: distributed computing, linux

Comments: No comments


[git] Change your previous commit

This post is a note for myself when dealing with git. It shows you the way to go back and change your previous commit without losing any of the new work you’ve done since you commited. Source of this post can be found at this link

  1. Save and stash your work so far.
  2. Look at git log and copy the first 5 or so characters from the ID of the commit you want to edit onto your clipboard.
  3. Start the interactive rebase process, pasting in the characters from the ID: git rebase --interactive ID
  4. Your editor will come up with several lines like pick d3adb33 Commit message, one line for each commit since the older one.
  5. Change the word “pick” to “edit” in front of the commit you want to change.
  6. Save and quit.
  7. Edit your project files to make the correction, then run git commit –all –amend.
  8. After you’ve committed the fixed version, do git rebase --continue.
  9. Git will do its magic, recreating all the commits since then.
  10. You might need to resolve some conflicts, if the change you made affected later commits. Follow Git’s instructions to resolve those.
  11. Once the rebase is done, re-apply the stash and continue happily with your life.

Posted: September 15th, 2009
at 11:23pm by admin

Tagged with


Categories: linux

Comments: No comments


[Hadoop] “Too many fetch-failures” or “reducer stucks” issue

I post the solution here to help any ‘Hadoopers’ that have the same problem. This issue had been asked a lot on Hadoop mailing list but no answer was given so far.

After installing Hadoop cluster and trying to run some jobs, you may see the Reducers stuck and TaskTracker log on one of the Worker node shows these messages :

INFO org.apache.hadoop.mapred.TaskTracker: task_200801281756_0001_r_000000_0 0.2727273% reduce > copy (9 of
11 at 0.00 MB/s) >
INFO org.apache.hadoop.mapred.TaskTracker: task_0001_r_000000_0 0.2727273% reduce > copy (9 of
11 at 0.00 MB/s) >
INFO org.apache.hadoop.mapred.TaskTracker: task_0001_r_000000_0 0.2727273% reduce > copy (9 of
11 at 0.00 MB/s) >

INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_001_r_000000_0 … killing it

The Reducer was failed to copy data through the HDFS, what we should do is to double check your Linux network and Hadoop configuration :
1. Make sure that all the needed parameters are configured in hadoop-site.xml, and all the worker nodes should have the same content of this file.
2. URI for TaskTracker and HDFS should use hostname instead of IP address. I saw some instances of Hadoop cluster using IP address for the URI, they can start all the services and execute the jobs, but the task never finished successfully.
3. Check the file /etc/hosts on all the nodes and make sure that you’re binding the host name to its network IP, not the local one (127.0.0.1), don’t forget to check that all the nodes are able to communicate to the others using their hostname.

Anyway, it doesn’t make sense to me when Hadoop always try to resolve an IP address using the hostname. I consider this is a bug of Hadoop and hope they will solve it in next stable version.

Posted: August 28th, 2009
at 11:20pm by admin

Tagged with , ,


Categories: Programming, distributed computing, linux

Comments: 1 comment


[Scribe] Another approach to make Scribe support HDFS

1. What is Scribe ?

“Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine. Scribe was developed at Facebook and released as open source”

2. Scribe and their latest announcement about HDFS support

Since 2009-06-06, Scribe announced that they started support HDFS ( You can read the full message here).  Scribe writes to HDFS directly using libhdfs (the C interface to HDFS provided by Hadoop package ) without waiting for the file to be rotated. Unfortunately, the HDFS Append feature (which is needed by Scribe ) is not  enabled in any stable version of Haddop, and your only choice is to use the trunk version of Hadoop.

“It has been disabled in the 0.20.0 release due to stability issues. It has also been disabled in 0.19.1, which means that there is currently no stable Hadoop release with a functioning HDFS append function.”
(quoted from cloudera.com)

I’ve tried to setup Scribe with trunk version of Hadoop, after fixing some minor bugs of incompatible function calls, I still could not make Scribe log messages to HDFS. Then I switched everything to their stable version and started to customize Scribe in my way.

3. Another way to make Scribe support HDFS

I added a new feature to Scribe named “HDFSSync”, you can check out the code at :
http://github.com/huyphan/Scribe-with-HDFS-support/tree/master

This feature is implemented as a new “store type” for Scribe ( the current HDFS support on trunk is implemented as a new “file type” )

The approach is quite simple : Instead of appending the messages to HDFS file everytime, Scribes logs the message to local machine and sends the file to HDFS when it’s rotated. As stable version of Scribe only supports daily and hourly rotation, I created a new configuration parameter to allow rotating after serveral minutes.

This is an example of configuration file :

port=1463
max_msg_per_second=2000000
check_interval=3

# DEFAULT

<store>

category=default
type=hdfssync

file_path=/tmp/scribetest
hdfs_dir=hdfs://192.168.4.93:9000/scribe
period_length=10
add_newlines=1

target_write_size=20480
max_write_interval=1
buffer_send_rate=2
retry_interval=30
retry_interval_range=10
base_filename=digit_log

</store>

There are 3 params that needed by HDFSSync :
- file_path : directory on local machine to store temporary log file before sending it to HDFS
- hdfs_dir : destination directory that the log files will be copied to.
- period_length : the time ( in minutes ) to rotate log file.

( The rotate_period parameter will be ignored if using this store type ).
This customize version is working with all the version of Haddoop since 0.18.3. I’ve made some stress testing and the result is quite good.

Posted: July 3rd, 2009
at 7:40pm by admin

Tagged with , , , ,


Categories: Programming, linux

Comments: 5 comments


[Pidgin Plugin] Message Filtering

As there are some emoticons that I really hate to see when talking with my friends on Yahoo or Gtalk, I wrote a small pidgin plugin to filter them out of the received messages.

It took me about 5 hours to understand the architecture of Pidgin and write this plugin. The source code and compiled files can be found here.

To install this plugin :
1. Copy the binary file ( message_filter.so ) to ~/.purple/plugins
2. Restart your Pidgin, a new plugin named Message Filter … will appear in your plugin list.
3. List all the words (separated by space) that you never want to see in the conversation.

Now I feel very comfortable when chatting with all my emoticons-addicted-friends, and if you’re curious about my black-list, here it is :

 :| :-| ^^ ^_^ :-/ :-? /:) :( 

Posted: March 31st, 2009
at 10:15am by admin

Tagged with , , , ,


Categories: Programming, linux

Comments: 1 comment


[Nginx Module] Authentication using Memcached

If you don’t know about Nginx, take a look at its wiki page.

I had spent one month to dig inside Nginx code and what I’ve done so far is a module named ngx_memcached_auth_module.

This module uses the requested file name and token parameter from url to fetch the value from memcached server. If there’s no value found, the server will return 403 code ( Forbidden Access ), otherwise, users are free to access the file.
If the option “ip_check” is turned on, it also checks if the IP of requester is the same as returned value.

This module is useful in some specific cases such as video streaming, download restriction ….

Here is the sample configuration part of this module :

location /download/ {
    root html;
    index index.html index.htm;
    set $memcached_auth_token_key bogus;
    if ($args ~ token=(.*))
    {
        set $memcached_auth_token_key $1;
    }
    memcached_auth /memcached/;
    ip_check 1;
}

location /memcached/ {
    root html;
    index index.html index.htm;
    set $memcached_key bogus;
    if ($args ~ token=(.*))
    {
        set $memcached_key $1;
    }
    memcached_pass 127.0.0.1:11211;
    default_type text/html;
}

In the sample above :
- download is the location that stores the requested files
- memcached is the temporary location that uses ngx_memcached_module of nginx to fetch value from memcached.
- the directive ip_check in download location is optional.

The source code can be found here.

Posted: March 25th, 2009
at 7:44pm by admin

Tagged with , ,


Categories: Programming

Comments: No comments