I was trying to find a program to automate the process of downloading content from rapidshare.com, I used to use RapidGet however this only works on windows and while it’s possible to run it on other platforms like linux using wine (I haven’t tried darwine on osx but I’m sure it would work) it’s not the ideal solution. I also found some issues with using RapidGet the most irritating being that sometimes it wouldn’t download files correctly and I would have to download the file manually in a browser. One of the other features I wanted was to remotely download files from a different machine, while this is also possible using linux and wine and forward the X11 calls over ssh. This solution is still not ideal because you need to be logged into the machine and leave the ssh session open until RapidGet is finished, pretty messy. So I decided to bite the bullet once rapidshare.de started to forward all their content to rapidshare.com and write my own bash script.
The script requires you to have a rapidshare premium account and the following programs and script to run correctly.
sed and wget are available on most if not all platforms Cygwin on Windows, fink or darwin ports on Mac OSX, and under linux or unix repositories. I use ubuntu so in the unlikely situation that wget or sed are not installed you can install them by typing sudo apt-get install sed wget list_urls.sed is a sed script available at the sed site on sed.sourceforge.net. list_utls.sed extracts all the hyper-link urls from a file.
So how did I create the script? Well with a little bit of detective work and a beer while I looked at how rapidshare.com fitted together. The first thing I did was to download livehttpheaders for firefox, this allows you to examine the headers of http requests including get, post and redirect data. Using this it became fairly easy to see what was going on. There are three steps to the script
- Login and save the cookie
- Using the saved cookie retrieve the actual url of the file
- Download the url with wget
To use the script you must provide it with a user password and a url e.g. downloadFromCom.sh -u username -p password -l link
The first part of the script handles user input
#!/bin/bash
TEMP=`getopt -o u:p:l: --long user:,pass:,url: \
-n 'downloadFromCom.sh' -- "$@"`
if [ $? != 0 ] ; then echo "Error. Correct useage options are \" -u -p -l \" Terminating..." >&2 ; exit 1 ; fi
eval set -- "$TEMP"
while true ; do
case "$1" in
-u|--user) echo "Username: $2"; user=$2; shift 2;;
-p|--pass) echo "Password: $2"; pass=$2; shift 2;;
-l|--url) echo "URL: $2"; url=$2; shift 2;;
--) shift ; break ;;
*) echo "Internal error" ; exit 1 ;;
esac
done
RED='\e[1;31m'
CYAN='\e[1;36m'
NC='\e[0m' # No Color
echo REALURL: $url
fileName=`basename $url`
echo FILENAME: $fileName
cookie=cookie
Next it logs into Rapidshare and saves the user cookie
## LOGIN and save cookie ##
wget --save-cookies=$cookie -q --post-data="login=$user&password=$pass" https://ssl.rapidshare.com/cgi-bin/premiumzone.cgi
rm premiumzone.cgi
wget --load-cookies=$cookie -q $url -O $fileName.temp
server=`grep post $fileName.temp | tr " " "\n" | grep action | sed 's/[^"]*"\([^"]*\).*/\1/'`
uri=`grep post $fileName.temp | tr " " "\n" | grep value | sed 's/[^"]*"\([^"]*\).*/\1/'`
#tr " " "\n" replaces all spaces with a new line
#sed 's/[^"]*"\([^"]*\).*/\1/' searches a string and retrieves the content from in between double quotes
echo SERVER: $server
echo URI: $uri
newURL="$server$uri"
echo NEWURL: $newURL
It now tries to find the actual URL
## RETRIEVE ACTUAL URL ##
wget --load-cookies=cookie -q --post-data=dl.start=PREMIUM $newURL -O $fileName.temp2
actualURL=`list_urls.sed $fileName.temp2 | grep /files | tail -1`
echo -e "${RED}ACTUALURL: ${CYAN}$actualURL${NC}"
fileName2=`basename $actualURL`
Lastly it downloads the file to “downloads” directory and also has the “-b” flag which enables background mode in wget allowing multiple files to download at once.
## DOWNLOAD ACTUAL URL ##
wget -b $actualURL -O ./downloads/$fileName2 --load-cookies=cookie
Finally does some clean up
## CLEAN UP ##
rm $cookie
rm $fileName.temp
rm $fileName.temp2
This script will only download one file and then exit. What if we have a lot of urls that we want download? Simple just put all your links into a text file separated by a new line character and pass that file as an option to this script. e.g. ./download.sh urls.txt
#!/bin/bash
for url in `cat $1`
do
./downloadFromCom.sh -u username -p password -l $url
done
All you have to do is make sure that you have edited this script and entered your username and password.Now that we have run the script and the files are downloading how do we know when they are finished? With this little script of course.
!/bin/bash
j=0
while true
do
clear
echo "=== Iteration $j ==="
for i in `ls ./wget-log*`
do
head -1 $i
saved=`grep saved $i`
if [ -z "$saved" ]; then
tail -3 $i | head -1 #tail -1 $i
else
tail -2 $i | head -1 #tail -3 $i | head -1
fi
done
let j++
sleep 3
done
All this does is poll the logs that wget produces and continually gives a summary of what logs it finds in the current directory. You will have to kill this manually by ctrl+c. I could have made sure that it quits itself once all the files have been successfully but its not really necessary.
So there we go a few scripts to download content from rapidshare you could of course use the same process write some scripts for other sites.
Here’s a zip file containing all the scripts. RapidShare Download Scripts
