Monday, April 19, 2010

SSH and SCP without using a password

I keep ssh-ing and scp-ing between two of the machines in my lab. After about a month of entering my password, I finally had enough and decided to setup the key based authentication for password-less SSH and SCP. Here are the steps:

If you have never connected to the other system before, then SSH to the system in the normal way, i.e. using a password:

$ ssh vinay@some.domain.com

You will be asked whether you want to add the machine's RSA fingerprint to your system's list of known hosts. Say 'yes' and you will be asked for the password... enter password... connected. You will not be asked this question from the next time onwards.

OK, now lets setup a password-less connection. This involves copying a security key generated on your system onto the machine that you want to connect to. First check if the key is already present in your system. If it is present, we can use it, else we will have to generate a new one. Here is how you check if you already have the key:

$ ls -l ~/.ssh
total 24
-rw------- 1 vinay vinay 1204 2010-04-19 12:24 authorized_keys
-rw------- 1 vinay vinay  668 2010-04-13 12:31 id_dsa
-rw------- 1 vinay vinay  602 2010-04-13 12:31 id_dsa.pub
-rw-r--r-- 1 vinay vinay 8496 2010-03-20 18:40 known_hosts

The file that we will have to copy over is "id_dsa.pub". If you do not see the file on your system, then here is how you generate one:

$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/vinay/.ssh/id_dsa): <hit enter key> 
Enter passphrase (empty for no passphrase): <hit enter key>
Enter same passphrase again: <hit enter key>
Your identification has been saved in /home/vinay/.ssh/id_dsa.
Your public key has been saved in /home/vinay/.ssh/id_dsa.pub.

If you check the ~/.ssh folder, you must see the "id_dsa.pub" file. Now that you have the key (or if you had it before), you need to copy it to the machine that you want to connect to. Here is how you can do that:

$ ssh-copy-id -i ~/.ssh/id_dsa.pub vinay@some.domain.com

The ssh-copy-id command takes the contents of the "~/.ssh/id_dsa.pub" file on the current machine (i.e. the key that you generated or already had) and adds it end of the "~/.ssh/authorized_keys" file on the machine that you want to connect to.

That's it!! You can now connect directly, without using a password. Test the setup:

$ ssh vinay@some.domain.com

Note 1: If you do not have the ssh-copy-id command, then you can do the copy manually

$ scp .ssh/id_dsa.pub vinay@some.domain.com:temp_dsa.pub
$ ssh vinay@some.domain.com
$ cat temp_dsa.pub >> .ssh/authorized_keys
$ rm temp_dsa.pub 

Note 2: Make sure that the key that you generated is secure. Change your permissions to 600, if required:

$ chmod 600 ~/.ssh/id_dsa
$ chmod 600 ~/.ssh/id_dsa.pub

Note 3: If you follow the above procedure and you are still unable to connect without a password, then check the "/var/log/auth.log" log file on the machine that you are connecting to. Here are two possible scenarios:

1. If you see a message like "Public key <fingerprint> from <source> blacklisted", then it means that your ssh key may have been compromised. Then, you will have to update your openssl and openssh packages and generate a new key and try out the whole procedure again.

2. If you see a message like "bad ownership or modes for directory /home/<user>", then it is most likely that your home folder has group write access... ssh does not like that at all. You will have to chmod your home directory permissions to 750.

Happy password-less SSHing and SCPing ;)

Sunday, April 18, 2010

Two Useful Disk Utilization Commands: df and du

Here are two things that we keep doing:

1. Finding out the amount of free space available on the disk
2. Finding out the amount of disk space a particular folder is taking

If I am in front of the Linux machine with a Desktop environment, I can just right-click and see the properties. But that is a totally uncool thing to do when working on Linux, right?

So, here is the command-line way of doing it:

1. To finding out the amount of free space available on the disk, just type:

$ df -h

"df" is the tool used to report the file system disk space usage. Using the "-h" option prints the sizes in human readable format (like 10K, 11M, 40G, 10T, etc).

2. To find out the amount of disk space a particular folder is taking, go to that particular folder and type:

$ du -sch

or, you can give the folder name(s) as arguments to "du", for example:

$ du -sch /home/vinay/projects /home/vinay/local /home/vinay/music

"du" is the tool to estimate the file space usage. The "-s" option does a summary and displays only the total for each argument. The "-c" option produces a grand total. The "-h" option prints the sizes in human readable format.

If you want to know more about these commands, see these links
1. du: http://www.linfo.org/du.html
2. df: http://www.linfo.org/df.html

Thursday, April 15, 2010

Converting XLS to CSV on Linux

While programming, its a real pain if the data that you want to process is present in xls spreadsheets. If its only one xls file, then you can open it using MS Excel or Open Office and save it as a csv file. Once you get the data in csv format, life becomes simple. Almost all the programming languages have libraries for parsing csv files. With scripting languages like Python or Ruby, you can literally play around with csv files with just a few lines of code.

But what if the data is contained in many xls files? This is a typical situation that a programmer may face when he is working with non-programmers. For example, in the research project that I am working on, we receive data on plant species from Botanists at the Smithsonian Institute. The data that they send is usually spread across 200 or so xls spreadsheets. In such situations, its great to have a command line tool that can convert xls to csv.

There are two command line tools that I found that do a pretty good job in converting xls to csv. They both have their advantages and disadvantages (which I will be talking about). Here they are:

1. xls2csv (by V.B.Wagner)

Here is the project webpage and here is the direct link to download the source.

After installing it, here is how I ran it:

$ xls2csv myfile.xls > myfile.csv

xls2csv prints the output to stdout which can then be redirected to a file

The good news:

1. Installation is straightforward.

2. It works great, even when the xls file has multiple worksheets. In such a case (xls file with multiple worksheets), xls2csv will print all the worksheets onto the csv file with the contents of each worksheet separated by a form-feed character (^L)

3. In any line, even if the first "n" columns are empty and data is present in the "(n+1)th" column, xls2csv will recognize that line and include it in the generated csv file

The bad news:

1. xls2csv has problems with dates. If the xls file has dates, like "03/31/85" or "31-Mar-1985" etc, then xls2csv will not reproduce the date as is. Instead it will convert the date into a number while generating the csv file. The man page for xls2csv mentions that using the -f flag to specify the date format will solve this problem. But I could never got it work; several others have also reported this problem in discussion boards.

So, if your xls files do not have dates, this is the thing for you. You will be happy with the results.

Here is the second tool:

2. xls2csv (by Ken Prows)

This tool is also called as xls2csv. But this is a different implementation by a different author. This is in Perl by Ken Prows whereas the first one was in C by V.B.Wagner (Note: Both these tools install to /usr/local/bin. So, if you will be installing both, make sure to configure them so that they have separate installation paths).

Here is the project webpage and here is the direct link to download the source.

Installing this can be a pain if you don't have the required Perl modules that xls2csv requires for its installation. However installing the required modules using CPAN is very easy. If you are having problems with the installation, here is a very nice tutorial on installing xls2csv.

After installing it, here is how I ran it:

$ xls2csv -x myfile.xls -c myfile.csv

The good news:

1. Unlike the 1st tool, this works great with dates. The generated csv file will have the dates in the exact same format as the original xls file.

The bad news:

1. Installation can become a bit tricky if you don't have the required Perl modules.

2. If the xls has multiple worksheets, by default, it will convert only the first worksheet to csv. However, it does support a -w flag wherein you can specify the sheet name that you would like to convert:

$ xls2csv -x myfile.xls -c myfile.csv -w worksheet_name

It also supports a -W flag using which you can list all the worksheets in the xls file. Usage:

$ xls2csv -W -x myfile.xls

3. If the first column is empty, the entire row is ignored. So if you have a xls file in which the first column is blank, but has tons of data from second column onwards, xls2csv will generate an empty csv file... That's bad !!

Well, if you have xls files which have lots of dates (and hopefully single worksheets per xls file), this is the tool for you.

Now lets batch-convert...

Here is my Ruby script to batch-convert xls files to csv. It takes 2 arguments: 1) the source directory that has all the xls files and 2) the target directory where you want to save the generated csv files. So, the usage is:

$ ruby xls2csv.rb "/home/vinay/myxls" "/home/vinay/mycsv"

where "myxls" is the directory which has all the xls files and "mycsv" is the directory into which all the csv files will be generated.

Here is the Ruby script:

# Author: Vinay Kumar Bettadapura

if ARGV.length != 2
    puts "usage: \"ruby xls2csv.rb source_dir target_dir\""
    exit -1
end

source_dir = ARGV[0]
target_dir = ARGV[1]

if !File.exists?(target_dir)
    puts "target_dir \"#{target_dir}\" is not a valid directory"
    exit -1
end

source_entries = []
begin
    source_entries = Dir.entries(source_dir).sort
    # To remove the first two array elements which 
    # will be "." and ".."
    source_entries.shift
    source_entries.shift
rescue Exception => e
    puts "source_dir \"#{source_dir}\" is not a valid directory"
    exit -1
end

if source_entries.empty?
    puts "source_dir \"#{source_dir}\" is empty"
    exit 0
end

source_entries.each{|file|
    source_file = source_dir + "/" + file
    target_file = target_dir + "/" + file.gsub(".xls", ".csv")

    puts "Converting \"#{source_file}\" to \"#{target_file}\""
    `xls2csv \"#{source_file}\" > \"#{target_file}\"`
}

puts "Done..."

The Ruby script uses the 1st xls2csv tool. If you want to use the 2nd xls2csv tool, then replace

`xls2csv \"#{source_file}\" > \"#{target_file}\"`

on line number 39 with

`xls2csv -x \"#{source_file}\" -c \"#{target_file}\"`

in the Ruby script.

Sunday, April 11, 2010

Mounting External USB Harddisk on Ubuntu

I connected an old external USB harddisk to my Ubuntu machine and found that it was not auto-mounting (sometimes these old USBs dont auto-mount). So, I had to mount it manually. Here are the steps I followed:

Step 1: Before plugging in the USB harddisk, check the available partitions on your system.

vinay@forest:~$ sudo fdisk -l

Disk /dev/sda: 80.0 GB, 80000000000 bytes
255 heads, 63 sectors/track, 9726 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000080

   Device Boot      Id  System
/dev/sda1   *       83  Linux
/dev/sda2            5  Extended
/dev/sda5           82  Linux swap / Solaris

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00003588

   Device Boot      Id  System
/dev/sdb1           fd  Linux raid autodetect

You can see that there are two partitions: /dev/sda (line number 3) and /dev/sdb (line number 13)

Step 2: Now plug in the USB harddisk and see which new partition shows up.

vinay@forest:~$ sudo fdisk -l

Disk /dev/sda: 80.0 GB, 80000000000 bytes
255 heads, 63 sectors/track, 9726 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000080

   Device Boot      Id  System
/dev/sda1   *       83  Linux
/dev/sda2            5  Extended
/dev/sda5           82  Linux swap / Solaris

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00003588

   Device Boot      Id  System
/dev/sdb1           fd  Linux raid autodetect

Disk /dev/sdc: 999.5 GB, 999501594624 bytes
255 heads, 63 sectors/track, 121515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0002ae3f

   Device Boot      Id  System
/dev/sdc1            7  HPFS/NTFS

You can see that:
1. /dev/sdc (line number 21) has now shown up. This is the USB harddisk partition.
2. /dev/sdc/ has HPFS/NTFS mentioned in the System column.

Step 3: Now that you know your USB harddisk partition (/dev/sdc in this case) and its type (HPFS/NTFS in this case), go ahead and mount it.

vinay@forest:~$ sudo mkdir /mnt/usbdrive
vinay@forest:~$ sudo mount -t ntfs /dev/sdc1 /mnt/usbdrive/

Thats it. You can now access your files from /mnt/usbdrive

To unmount:

vinay@forest:~$ sudo umount /mnt/usbdrive

Friday, April 9, 2010

Returning from main: exit vs return


Does it make a difference if the last statement in the main function is a "return" or an "exit"? I spent quite some time understanding the differences. However I am too lazy to consolidate all my findings and write my own notes here. So, here are the links that I found useful. Go through them in order and you will emerge with a good knowledge of the differences.

1. http://bytes.com/topic/c/answers/222362-difference-between-return-exit
2. http://bytes.com/topic/c/answers/221476-whats-difference-return-0-exit-0-exit-1-a
3. http://www.daniweb.com/forums/thread208168.html#
4. http://c-faq.com/ansi/exitvsreturn.html
5. http://c-faq.com/strangeprob/crashatexit.html
6. http://www.gnu.org/s/libc/manual/html_node/Cleanups-on-Exit.html
7. http://www.cplusplus.com/reference/clibrary/cstdlib/atexit/
8. http://stackoverflow.com/questions/461449/return-statement-vs-exit-in-main

Hope that helped :)

Thursday, April 8, 2010

First Post


Since most of my posts are going to be technical, I was interested in finding a way to post code snippets on Blogger. After some Googling around, I found this post on "easy syntax highlighting for blogger", which is exactly what I was looking for. I have followed the instructions and am going to test it out now. Here goes a code snippet...

#include <stdio.h>
#include <stdlib.h>

int main()
{
    printf("Hello World!\n");
    return 0;
}

This post by Patrick Webster is also very helpful, especially the sections on "Step 2: Clean the Code" and "Blank Lines in IE".