WebReaper v10

Version History

Help Files

Mailing List

Known Bugs

Other Software

Get GetRight

Developed using the CodeJock Xtreme Toolkit

WebReaper Version History

Version Changes/Additions


Smart ordering when downloading - images, etc., are given priority, which should result in more completed pages downloading when a site download is stopped before completion. Also options to replace .ASP/.PHP/.ASPX file extensions with .HTML which works better for local browsing, as well as automatically renaming files with .HTML file extensions if no extension existed previously. New option to include download date as part of local folder name. Also added support for downloading images embedded in CSS files. This version is compiled with Visual Studio.Net.


Fix to the Favourites code to make it more robust - in particular fixing a problem which caused the application to fail to start on NT4 systems.


I've had a few reports of stability problems since v9.3, and this version fixes a couple of key problems which seemed to be causing this. There's also some improved link searching within Java code, as well as the introduction of the 'Favourites' menu from Internet Explorer, allowing Favourite sites to be reaped simply by selecting them from the menu.


There were some problems with v9.5, as Win9x was unable to load some of the resources due to limitations in the OS. This meant that Win95/98 reported the executable as being corrupt, whilst NT/2K could load/run it fine. In order to avoid confusion, I flipped the version number up one once this was fixed.


A minor update with some fixes and improvements. In particular, this version fixes the "Failed to Create Empty Document" bug which stopped the application from starting.


A few minor fixes and optimisations, and a new version of the installer. In particular URL contents filters now disregard the case of the URLs.


GetRight® support - users running GetRight can switch on an option which will force all files over a certain size to be passed to GetRight for segmented & resumable downloads.


Added Macromedia/Shockwave Flash support.


Removed advertising system. Added keyword/content filter. Fixed 'base' tag problem.


Added the ability for registered users to backup their registration data, in case they want to move their registration to another machine, or reinstall WebReaper.


Tiny fix for some people who were finding that v8 was still minimizing on startup.


This version sees many of the existing problems that have been reported to me resolved, plus a few extra features added. In particular:

  • New Filter Wizard, to make the construction of filters simpler.
  • New Help system - Windows-style with full search/index
  • Fixed a problem with file time comparison, which means that 'refreshing' a previously downloaded site is much faster - only changed objects will be downloaded. Previously, the local/remote timestamp comparison didn't always work.
  • Link adjustment now works correctly across the entire locally saved site. Previously, some links weren't adjusted correctly, meaning that the local site couldn't always be browsed correctly.
  • Maximum size filter fixed (previous versions had bigger/smaller swapped).
  • View positions and column widths are now remembered between WR sessions.
  • 'Minimize at startup' bug fixed.
  • Fixed the bug where the error 'Unhandled error during download' is occasionally displayed, and the app needs to be restarted.
  • New option on the Threads page of the options dailog, which forces WebReaper to wait for any currently downloading files to complete before the download is aborted.
  • Statistics output to log to show the number of files downloaded/skipped/failed.
  • Added '-logurls' command-line option, which will generate a log of all URLs processed, written to webreaper_url.log.
  • Plenty of other bug-fixes.

This version sees the introduction of the advertising system built into WebReaper. I know some users aren't keen on this move, but support for the application has grown into a time-consuming job, which has to pay for itself somehow.


Finally fixed the registry corruption bug! Also, tweaked the refererrer and redirection handling code so that more sites will download correctly (previously the wrong referrer string was being passed to the server, causing it to return 'Access Forbidden'. URL profiles now store all settings, filter and URL, and can be embedded in batch files.


Full release of previous beta, with new rewritten documentation, and lots of fixes, speedups and improvements. Also added the voluntary registration - donations towards development costs gratefully accepted. :-)


Beta version released with redesigned configurable filters, better batch handling, simpler configuration options and many speed increases.


Okay, so the flickering was still there - albeit whilst the application was minimized. But now it's really fixed...


A quick little release to fix the desktop icons from flickering during reaping sessions. It also fixes some problems which were causing some links in html files not to be adjusted for browsing. Also in this release, there's a new option which allows you to specify that the limits (max size, min time, etc) are applicable to either HTML files, binary files or both. Leaving the log file location blank will now result in no log file at all, rather than the previous 'webreaper.log' in the current directory.


It's been a while, but finally here's the next version. It's got a prettier interface, some major bugfixes, and some new features too. I've tried to fix as many of the bugs you've told me about as is possible, but keep 'em coming - it's very helpful (and saves me from doing so much testing... ;-)

Here's the list of major bits which have changed:

  • Some fixes in the link parsing code
  • URL Exclusions added to options dialog.
  • Log file, with user-defined location
  • Temporary/output files should now be closed properly and deleted too.
  • Last modified date column in details pane.
  • New link processing order - inline images/objects are downloaded before other links are followed.
  • New colours in links tree: black/pending, red/skipped, green/downloaded, grey/excluded.
  • New icons in details pane, and redraw optimised to reduce flickering.
  • Pause/Resume download - the stop button now pauses a download (the number of links remaining to resume is indicated in the status bar) allowing links to be included/excluded by hand before resuming the download.
  • 'Gentle mode' allows inter-object delay to save thrashing servers (single-thread mode only).
  • Major memory leak fixed.
  • Single executable - smaller and no dependancy on the MFC42.DLL.
  • All links fixed up, even if they're below the maximum crawl depth.


Not as many new features or fixes as I'd have liked, but it was about time to get a new version released. Lots of tweaks and little fixes of things which weren't very polished in v6.0.

Some of the more exciting changes:

  • Shortcut for locally browsable files now named with page title
  • New option 'intelligent modem dial'. If the application seems to have difficulty in detecting a currently active internet connection, unchecking this option should help.
  • New option to set the user-agent string. For advanced users, this allows WebReaper to 'spoof' as other browsers, enabling browser-tailored pages to be reaped (although I think that this goes completely against the 'generic browsing' nature of the internet, but hey, that's Microsoft and Netscape for you...)
  • New option on the 'Filter' page to stop the download of inline images. Previously, the only way to filter them was to set a size limit.
  • Command-line support for batch files should now work

I know there's a few fixes missing which I did promise, but I've been very busy and just haven't had much time to work on WebReaper. Hopefully in a couple of weeks' I should have more time to catch up on the massive list of requested features.


Well, it was about time I started behaving myself and adhering to standards, and a couple of severe telling-offs from server administrators has lead to me finally putting in a complete and proper implementation of the 'robots.txt' exclusion standard. There is no override for this; if a administrator doesn't want you reaping certain parts of their server, then that's their perogative. However, you could always mail them and ask for special permissions for the 'webreaper' user-agent. I ask that you move to this new version as soon as possible - I've had various reports of WebReaper causing major problems by reaping servers from which it should have been prohibited.

Please note: I have also had a few complaints about WebReaper being run constantly on a server over a long period (6 hours, in one case). I have decreased the maximum download time to 2 hours, but if you are planning a 'major reap' of a site (i.e. > 10-15 minutes), please contact the administrator first to ask permission. Leaving WebReaper running for long periods can cause major problems on servers, which could then lead to the services provided there being withdrawn. Common courtesy never hurt anyone.

If the complaints continue to increase I may have to start distributing WebReaper with licences so that users who abuse webservers can be identified - this extra administration would almost certainly require me to change the Freeware status of the software.

Also in this release is complete support for the 'If-Modified-Since' HTTP header, which is a much more efficient method of getting files; as well as reducing server load, it will also cut download times. This feature kicks in whenever the 'check for changes' option is checked when reading local files, and also if the new option is used, which allows you to limit downloads to files which have changed in the last n days.

I've also added a thread monitor, and cut down the number of states (and therefore the number of required refreshes) for the details/tree panes. This should reduce the amount of work being done by the CPU, whilst still allowing you to keep up with the progress of the threads.

Oh, and of course the usual batch of little fixes for bugs which you've all been telling me about. ;-)


Never one to rest on my laurels, me. This version allows the robots.txt 'ban' to be overridden, since (as you've all told me) just about every site seems to have a robots.txt file on it. In the end, to avoid website owners wrath, I've left the decision to the user - with a suitable warning. However, any site which really doesn't want WebReaper crawling it's pages can override this (mail me for details).

This release also allows you to save files locally without the IE cache being filled (saving hard disk space) using the new "Don't save to IE cache" option.

Other fixes: the intelligent connect stuff now works correctly if you're using a proxy server, and running WebReaper with a URL as a command-line argument will immediately kick off a download.


Never rains but it poors. Seems that 5.2 & 5.3 kept resolving URLs down to http:///, which doesn't work very well. I promise that this version works. Also the tooltips in the main details pane stopped working, so I've fixed them too.


Ahem. Well. Errm. So, let's just forget about v5.2, shall we? Not the most successful of releases - the bookmark bug was still present (just slightly different) and the 'intelligent modem connect/disconnect' feature was actually pretty dumb - it would disconnect the modem after each file was downloaded, and then redial for the next one... Doh!

Anyway, things should be much better now. ;-)


  • Fix for the bug where the bookmarks ('#') aren't processed correctly
  • Fixed a nasty memory leak.
  • Robots.txt and support to stop webreaper crawling sites. Also, a page will not have it's links followed if the a meta-tag is present with the name set to 'robots' and the content set to 'nofollow'.
  • Dockbars for the batch and log windows.
  • Better link parsing - including bookmarks and BASE tag support.
  • Clean up temp (~.tmp) files during downloads.
  • Redesigned documentation (thanks to Carl Osterly)
  • Intelligent connection/disconnection before and after sessions.
  • Automatic software updates - WebReaper will check this website for new versions

Plus a few other minor fixes.


Minor release - fixes a slight parsing problem to do with bookmarks.


Major new release - the one you've all been waiting for! New features include:

  • HTML links in locally saved pages can now be adjusted for local browsing. Websites saved locally should now be browsable with any browser (Netscape, IE, etc.).
  • Lots of HTML parsing fixes - most pages should work now.
  • Proxy/Password support. Enter a username and password on the Security tab in the Options dialog. Also, option to be prompted if the current password doesn't work.
  • Newly designed simple screen for Min/Max limit options.
  • Details list sortable by clicking on headers - toggles through ascending, descending & unsorted.
  • URL Profiles - save a complete set of options associated with a particular URL for ease of 're-reaping' at a later date.
  • Strip HTML Tags - save text-based pages with their HTML tags removed.

Plus lots of other fixes. Note that some of these features haven't been completely tested (for example, I can't test the proxy authentication code as I don't have access to a password-protected proxy server!), so let me know if there's problems by mailing me at the usual address.

Also note that the new documentation has been 'thrown together', but is worth a read. I've taken out the screen shots (hence the smaller EXE to download) since a) they were pointless and b) I haven't had time to update them. A completely new (jazzed up) set of documentation is currently in production...


'Resume' mode added. When the "don't download files which exist locally" checkbox is checked in the options, WebReaper used to just skip/ignore any files for which there was an equivalent on the local machine (created using the 'save' options). Now WebReaper will actually read the local file, looking for links and downloading them. Because the local files are ready extremely quickly but files which were not previously downloaded are read from the internet, this mode can now be used as a 'resume' feature, effectively continuing a download from where it was previously terminated.

Note that the resources on the internet are checked and if they are more recent than those on the local machine they will be downloaded regardless; this means that the local files will always be as up-to-date as possible.

I have also added the facility to exclude/include URLs from the download as it happens. During a download, right-click on the URL tree to show a menu with 'Include URL' on it. If the URL is to be downloaded then this will be shown with a 'tick' next to it. By selecting the menu item you can toggle the status of the URL, including/excluding it as required. Note that when a URL is included/excluded, this setting is recursive - that is, all 'child' links will be changed to the same value. This allows entire branches of URLs to be included/excluded.


Found out that the links in HTML tags could use single quotes and not just double ones, which was stopping WebReaper following links on some sites. Also, where the webpage size isn't available, the list display shows the amount of data downloaded (instead of the usual percentage read).

Documentation fixed to reference pictures correctly (FrontPage Express stitched me up and put references to "Program Files" in there, so that if you installed the app anywhere else the pictures would fail to load). Thanks Microsoft.


Fixed the bug where the maximum download time was completely ignored and the app kept downloading regardless. When files are saved to the local hard disk, the server name is saved as part of the pathname to avoid file clashes. It was reported that the files were being left locked whilst webreaper was working, stopping them from being previewed until the download was complete; this should be fixed now.

Unfortunately I can't seem to fix the IE4 dependency, so it looks like Microsoft wins again... you'll need at least a minimum install of IE4 for WebReaper to work... :-(


Fixed the bug where all URLs were converted to lower case, causing some files on unix servers to fail their download when their URLs contained upper-case letters.


Fixed a bug on Windows 95 where the files being saved locally weren't renamed correctly (they're created with a "~.tmp" extension and then renamed once the file download has completed - except they weren't on Win95 due to me using an NT-only rename function. Doh!). The files are now saved with the same date and time as the original source file. This allows WebReaper to skip files which haven't changed since the last download when using the "Don't download existing files" flag.


Fixed a bug which meant that garbage appeared in files when they were saved to the local machine (using the Save tab). Also fixed some html parsing problems (found some links in a format I'd not come across before). Removed the password option - it doesn't work (more thought required). Added Save All and Download All checkboxes to override the type-selection lists. The file-extension filter makes a return due to popular demand, allowing files to be filtered on both file extension and file type. Changed type lists so that a context menu provides access to 'check all', 'uncheck all' and other related options - this is also supported on the Save tab. Offline mode now affects the global system; switching this mode in WebReaper will affect Internet Explorer, and vice versa. Application now distributed as a single executable (a self-extracting Installation program).


Fixed timeout problems, and added 'skipped' status for ignored files. New feature on 'General' options tab allows you to enter a username and password which will be used for any links which need to be authorised. Note that a) the password is stored in the registry, so clear any really secure ones before closing the application and b) I don't know if that feature works, as I don't have any username/password protected sites. Feedback please (whether it works or not).

Oh, and cut and paste now works in the address combo... :-)


More work on multi-threading (to fix timeout problems on large files with higher numbers of threads). Files which fail due to timeouts will now be retried - up to five times.

Improvements to the download filters: internet resources can now be downloaded/filtered using their actual object type (rather than just file extensions). New (previously unseen) resource types are added to the list as they are discovered. The Save options tab also uses this method to allow only particular types of file to be saved.

Cut & Paste to the address combo still doesn't work.... :-(


Restructured to use multiple threads for downloads, which should substantially improve performance.


A few minor fixes. New look toolbar (if you have IE4 installed). First version of documentation included.


Fixed major bug where the system image list (which provides the icons for files and programs in the Win95 Explorer) is damaged, causing all icons to disappear.


Added drag/drop support. Links can now be dragged to/from WebReaper and your browser.


Fixed a few more bugs, converted options into 'tabbed' property page.


Fixed crash when return is pressed from within the Address toolbar list. Links tree now only shows 'parent' links (i.e., links to pages that have sub-links). The details and tree view use the system file type icons. Added option to only download links in subdirectories of the root. New 'offline' mode which, combined with the ability to open links by double-clicking on them in the details pane, allows downloaded links to be browsed.


Converted 'address' field to toolbar, rearranged view panes. Improved link parsing (certain URLs were being ignored). Added better limit configuration. Also added the ability to save downloaded binary files on disk.


First public release.

Copyright © 1998-2006 Mark Otway. All rights reserved. Go Back to Top