Downloading content from Rapidshare.com using wget and bash

I was trying to find a program to automate the process of downloading content from rapidshare.com, I used to use RapidGet however this only works on windows and while it’s possible to run it on other platforms like linux using wine (I haven’t tried darwine on osx but I’m sure it would work) it’s not the ideal solution. I also found some issues with using RapidGet the most irritating being that sometimes it wouldn’t download files correctly and I would have to download the file manually in a browser. One of the other features I wanted was to remotely download files from a different machine, while this is also possible using linux and wine and forward the X11 calls over ssh. This solution is still not ideal because you need to be logged into the machine and leave the ssh session open until RapidGet is finished, pretty messy. So I decided to bite the bullet once rapidshare.de started to forward all their content to rapidshare.com and write my own bash script.

The script requires you to have a rapidshare premium account and the following programs and script to run correctly.

sed, wget, list_urls.sed

sed and wget are available on most if not all platforms Cygwin on Windows, fink or darwin ports on Mac OSX, and under linux or unix repositories. I use ubuntu so in the unlikely situation that wget or sed are not installed you can install them by typing sudo apt-get install sed wget list_urls.sed is a sed script available at the sed site on sed.sourceforge.net. list_utls.sed extracts all the hyper-link urls from a file.

So how did I create the script? Well with a little bit of detective work and a beer while I looked at how rapidshare.com fitted together. The first thing I did was to download livehttpheaders for firefox, this allows you to examine the headers of http requests including get, post and redirect data. Using this it became fairly easy to see what was going on. There are three steps to the script

  1. Login and save the cookie
  2. Using the saved cookie retrieve the actual url of the file
  3. Download the url with wget

To use the script you must provide it with a user password and a url e.g. downloadFromCom.sh -u  username -p password -l link

The first part of the script handles user input

#!/bin/bash
TEMP=`getopt -o u:p:l: --long user:,pass:,url: \
-n 'downloadFromCom.sh' -- "$@"`

if [ $? != 0 ] ; then echo "Error. Correct useage options are \" -u  -p  -l \" Terminating..." >&2 ; exit 1 ; fi
eval set -- "$TEMP"

while true ; do
case "$1" in
-u|--user) echo "Username: $2"; user=$2; shift 2;;
-p|--pass) echo "Password: $2"; pass=$2; shift 2;;
-l|--url) echo "URL: $2"; url=$2; shift 2;;
--) shift ; break ;;
*) echo "Internal error" ; exit 1 ;;
esac
done

RED='\e[1;31m'
CYAN='\e[1;36m'
NC='\e[0m' # No Color

echo REALURL: $url
fileName=`basename $url`
echo FILENAME: $fileName
cookie=cookie

Next it logs into Rapidshare and saves the user cookie

## LOGIN and save cookie ##
wget --save-cookies=$cookie -q --post-data="login=$user&password=$pass" https://ssl.rapidshare.com/cgi-bin/premiumzone.cgi
rm premiumzone.cgi
wget --load-cookies=$cookie -q $url -O $fileName.temp

server=`grep post $fileName.temp | tr " " "\n" | grep action | sed 's/[^"]*"\([^"]*\).*/\1/'`
uri=`grep post $fileName.temp | tr " " "\n" | grep value | sed 's/[^"]*"\([^"]*\).*/\1/'`
#tr " " "\n" replaces all spaces with a new line
#sed 's/[^"]*"\([^"]*\).*/\1/' searches a string and retrieves the content from in between double quotes

echo SERVER: $server
echo URI: $uri
newURL="$server$uri"
echo NEWURL: $newURL

It now tries to find the actual URL

## RETRIEVE ACTUAL URL ##
wget --load-cookies=cookie -q --post-data=dl.start=PREMIUM $newURL -O $fileName.temp2
actualURL=`list_urls.sed $fileName.temp2 | grep /files | tail -1`
echo -e "${RED}ACTUALURL: ${CYAN}$actualURL${NC}"
fileName2=`basename $actualURL`

Lastly it downloads the file to “downloads” directory and also has the “-b” flag which enables background mode in wget allowing multiple files to download at once.

## DOWNLOAD ACTUAL URL ##
wget -b $actualURL -O ./downloads/$fileName2 --load-cookies=cookie

Finally does some clean up

## CLEAN UP ##
rm $cookie
rm $fileName.temp
rm $fileName.temp2

This script will only download one file and then exit. What if we have a lot of urls that we want download? Simple just put all your links into a text file separated by a new line character and pass that file as an option to this script. e.g. ./download.sh urls.txt

#!/bin/bash
for url in `cat $1`
do
./downloadFromCom.sh -u username -p password -l $url
done

All you have to do is make sure that you have edited this script and entered your username and password.Now that we have run the script and the files are downloading how do we know when they are finished? With this little script of course.

!/bin/bash
j=0
while true
do
clear
echo "===       Iteration $j    ==="
for i in `ls ./wget-log*`
do
head -1 $i
saved=`grep saved $i`
if [ -z "$saved" ]; then
tail -3 $i | head -1 #tail -1 $i
else
tail -2 $i | head -1 #tail -3 $i | head -1
fi
done
let j++
sleep 3
done

All this does is poll the logs that wget produces and continually gives a summary of what logs it finds in the current directory. You will have to kill this manually by ctrl+c. I could have made sure that it quits itself once all the files have been successfully but its not really necessary.

So there we go a few scripts to download content from rapidshare you could of course use the same process write some scripts for other sites.

Here’s a zip file containing all the scripts. RapidShare Download Scripts

Protected: Result visualisation within pervasive systems

This post is password protected. To view it please enter your password below:

Protected: Improving accuracy and reliability of results from sensor networks.

This post is password protected. To view it please enter your password below:

Listing installed packages in Ubuntu linux

I was trying to find out how to list the currently installed packages in Ubuntu. I was assuming that there would be a option flag for either apt-get or apt-cache but I couldn’t find one. Not saying that there isn’t one but the following command does the job.
dpkg --get-selections

Lovely

Welcome to Mark On Tech

Yahoo… it’s finally up, after a long and hard search for a decent domain name. I’ve finally settled on one, in fact make that six markontech.com .net .org and markontechnology.com .net .org. Probably a little over kill but at least they’re mine all mine (queue evil laugh muhahaha). It’s taken me over a year of humming and hawing about trying to get some hosting sorted. I’ve gone through some crazy amount of whois querys but all of which haven’t lead to something that I was happy with. Nearly everything I thought of was taken.

My criteria for the domain name was something short preferably a max of five to six characters something that would be easy to remember and something which (if it wasn’t actually a correct english word) when heard a person would be able to spell it. E.g. boop goop droop You can see there’s a running pattern of vowels in my ideal domain names. Probably something to do with the fact that I may have asked for too many of them from Carol and didn’t know what to do with them (a sure throw them in there and see what happens). The main problem I came across was the sheer amount of domains that weren’t even being used alot of them were just place holders or the usual if you’re interested buy me now for some ridiculous amount of money.

There has to be a better way of organizing the registration process for domain names to actually give domain names to organizations or people who actually deserve them, instead of squatters getting them. I can’t think of one off the top of my head but what would be fun is if there was some sort of competition and who ever won it would be awarded the domain name something like virtual reality or physical ability section in The Krypton Factor. Here’s Gordon trying his best to keep things at a mature level.


Shooting Stars - Virtual Reality

I’ve decided on using wordpress as my content manager for the site (initially). I’ve used wordpress before on my local machine for some simple projects, but this is going to be my first large project. I like the way it’s easy to get it up and running fairly quickly and start adding content, it also seems fairly mature at this stage. I used to use the old 1.5 version of wordpress and the transition to version 2 is quite pleasant. The admin side of wordpress is far more intuitive and much nicer to work with. There were a few issues however. I have a confession to make, my spelling is terrible (c’mon I’m a computer scientist) so one of the main features I was looking for was a spell checker.

I had a look at Live Spell Checker 0.4 which claimed to an AJAX powered spell checker which integrated into the WYSIWYG editor of wordpress. Sound great I thought not much configuration it said just throw it into the plugins directory in wp-content. It produced some errors when I tried to activate it. I had a look around and found a newer version 0.6 so I tried that. The install went ok and it appeared in the editor “cool” I thought looks promising but it didn’t work. I tracked down the problem to what I think is a with the server I’m using. Live Spell Checker uses pspell the php spelling utility and seems to fall back to gnu aspell if it doesn’t find it. When I ran it timed out so initially I thought the godaddy servers don’t have it installed, it did seem strange that they didn’t have aspell installed its a fairly standard tool. It was at that point I gave up I had already waisted 15 mins on trying to get it working and I was irritated at the two versions not working straight out of the box.
It does look like as if the wordpress are developing a spell checker for the next main release of wordpress which is 2.5. Then I realized that the new version of firefox has spell checking as standard. Downloaded the new version and bingo, spell checking on the fly… groovy!

One of the other things was the ability to embed flash based video into a post. Wordpress rips out all the embed HTML so i stumbled across WPvideo which allows you to embed your youtube, yahoo, google and metacafe videos by enclosing them in video tags.

I’ll see how wordpress turns out but I’ll can imagine that I’ll write my own probably using java and rife, mind you I’m learning ruby on rails and it seems pretty similar just depends on performance issues then.

So what can you look forward to on markontech.com, well other than my utter ramblings about nothing too important. I intend to impart the little knowledge I have about technology on to the masses. You may find it interesting, you may not the only time will tell.