WEBMIRROR PRO v2.0 2.0 User's Guide
There are two different ways for using WEBMIRROR PRO v2.0. The first one is to use the
command line parameters to define retrieval domain, start page and parameters.
However this way is present only for compatibility reasons to support those who
implemented earlier version of WEBMIRROR PRO v2.0 in command files. For new users this
way of usage is not recommended. You can get further information on commad line
paramteres here
The recommended way to start WEBMIRROR PRO v2.0 is to invoke the shell command line
defining a Retrieval Definition File (RDF):
webmirror -f mypages.RDF
You can use any extension for the RDF file. Using .rdf is only a
recommendation. WEBMIRROR PRO v2.0 will start to retrieve pages according to the
definitions given in the RDF file. The RDF file is a text file that you
can generate using any text editor, like notepad or vi.
If you specify STDIN as a file name webmirror will read the RDF information from
the standard input.
You can also say
webmirror -h
for a short help.
RDF file format
The RDF contains configuration lines. Each configuration option is on a single line. Leading
and trailing spaces on a line are ignored. Empty lines and lines starting with a # are
treated as comment. The commands in an RDF file are case sensitive.
The first word of a line is the name of the command until the first space. The rest of the
line is the parameter for the command. There is a simple method to include the content of
other files into a RDF file using the command config.
- agent define useragent to report to the server.
- allframes retrieve frames out of the defined domain.
- allpictures retirve pictures out of the defined domain.
- auth define authentication.
- believe content length information.
- directory define the root directory for the retrieved files.
- cookies define how to handle cookies.
- exclude define exclude patterns for the domain.
- include define include patterns for the domain.
- interface define interface to be used for retrieval.
- leafs create leaf pages (default).
- level define how many levels to retrieve.
- log define log file.
- map choose file name creation method.
- noauth define no authentication domain.
- noleafs do NOT create leaf pages.
- pagesizelimit to limit the maximal file size.
- nopictures do NOT retrieve pictures.
- pictures retrieve pictures (default).
- proxy define proxies.
- directget to use old HTTP/1.0 urls in GET when not using proxies.
- regexp can use Perl regular expressions for patterns.
- size define the total size of the retrieval.
- start define start URLs.
- unbelieve content length information.