linux · ppc64le · x86_64

linux: recursively downloading files from a FTP server protected by a robot

I needed to recursively download a set of files from a ftp server. As usual, my first option was to use wget to execute this job.

When I issued the command, the following interesting message showed up:

Note: I replaced the original FTP for <ftp_location>.


Saving to: ‘<ftp_location>/robots.txt’

public.dhe.ibm.com/robots 100%[=====================================>] 131 --.-KB/s in 0s

2017-04-05 09:34:37 (13.1 MB/s) - ‘<ftp_location>/robots.txt’ saved [131/131]

FINISHED --2017-04-05 09:34:37--
Total wall clock time: 6.0s
Downloaded: 2 files, 1.9K in 0s (65.9 MB/s)

Instead of downloading the files I wanted, it downloaded a file called robots.txt, which content politely says to go away 🙂


# go away
User-agent: *
Robot-version: 2.0
Allow: /support/knowledgecenter
Allow: /softsupt/os2ddpak
Allow: /download
Disallow: /

From this reference: “...web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol…“. In my case, the robot didn’t allow me to access the files I wanted.

To overcome this issue, I did the following (self-explanation command):


wget -r --no-parent -l1 -e robots=off --wait 1 <FTP_DIR_URL>

Where:

wget: utility for non-interactive download of files from the Web (see man wget)

-r: recursive retrieving

–no-parent: do not ever ascend to the parent directory when retrieving recursively

-l1: specify recursion depth to the first level

-e: execute command as if it were a part of .wgetrc

robots=off: the command executed by the flag -e

–wait 1: wait 1 second between retrievals

It works 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s