| Version |
Changes/Additions |
10.0
|
Smart ordering when downloading - images, etc., are given priority, which should result in more completed pages downloading when a site download is stopped before completion. Also options to replace .ASP/.PHP/.ASPX file extensions with .HTML which works better for local browsing, as well as automatically renaming files with .HTML file extensions if no extension existed previously. New option to include download date as part of local folder name. Also added support for downloading images embedded in CSS files. This version is compiled with Visual Studio.Net.
|
9.8
|
Fix to the Favourites code to make it more robust - in particular fixing a
problem which caused the application to fail to start on NT4 systems.
|
9.7
|
I've had a few reports of stability problems since v9.3, and this version fixes
a couple of key problems which seemed to be causing this. There's also some
improved link searching within Java code, as well as the introduction of the
'Favourites' menu from Internet Explorer, allowing Favourite sites to be reaped
simply by selecting them from the menu.
|
9.6
|
There were some problems with v9.5, as Win9x was unable to load some of the
resources due to limitations in the OS. This meant that Win95/98 reported the
executable as being corrupt, whilst NT/2K could load/run it fine. In order to
avoid confusion, I flipped the version number up one once this was fixed.
|
9.5
|
A minor update with some fixes and improvements. In particular, this version
fixes the "Failed to Create Empty Document" bug which stopped the application
from starting.
|
9.4
|
A few minor fixes and optimisations, and a new version of the installer. In
particular URL contents filters now disregard the case of the URLs.
|
9.3
|
GetRight® support
- users running GetRight can switch on an option which will force all files
over a certain size to be passed to GetRight for segmented & resumable
downloads.
|
9.1
|
Added Macromedia/Shockwave Flash support.
|
9
|
Removed advertising system. Added keyword/content filter. Fixed 'base' tag
problem.
|
8.1
|
Added the ability for registered users to backup their registration data, in
case they want to move their registration to another machine, or reinstall
WebReaper.
|
8.0.1
|
Tiny fix for some people who were finding that v8 was still minimizing on
startup.
|
8.0
|
This version sees many of the existing problems that have been reported to me
resolved, plus a few extra features added. In particular:
-
New Filter Wizard, to make the construction of filters simpler.
-
New Help system - Windows-style with full search/index
-
Fixed a problem with file time comparison, which means that 'refreshing' a
previously downloaded site is much faster - only changed objects will be
downloaded. Previously, the local/remote timestamp comparison didn't always
work.
-
Link adjustment now works correctly across the entire locally saved site.
Previously, some links weren't adjusted correctly, meaning that the local site
couldn't always be browsed correctly.
-
Maximum size filter fixed (previous versions had bigger/smaller swapped).
-
View positions and column widths are now remembered between WR sessions.
-
'Minimize at startup' bug fixed.
-
Fixed the bug where the error 'Unhandled error during download' is occasionally
displayed, and the app needs to be restarted.
-
New option on the Threads page of the options dailog, which forces WebReaper to
wait for any currently downloading files to complete before the download is
aborted.
-
Statistics output to log to show the number of files downloaded/skipped/failed.
-
Added '-logurls' command-line option, which will generate a log of all URLs
processed, written to webreaper_url.log.
-
Plenty of other bug-fixes.
This version sees the introduction of the advertising system built into
WebReaper. I know some users aren't keen on this move, but support for the
application has grown into a time-consuming job, which has to pay for itself
somehow.
|
7.3
|
Finally fixed the registry corruption bug! Also, tweaked the refererrer and
redirection handling code so that more sites will download correctly
(previously the wrong referrer string was being passed to the server, causing
it to return 'Access Forbidden'. URL profiles now store all settings, filter
and URL, and can be embedded in batch files.
|
7.2
|
Full release of previous beta, with new rewritten documentation, and lots of
fixes, speedups and improvements. Also added the voluntary registration -
donations towards development costs gratefully accepted. :-)
|
7.0b
|
Beta version released with redesigned configurable filters, better batch
handling, simpler configuration options and many speed increases.
|
6.4
|
Okay, so the flickering was still there - albeit whilst the application was
minimized. But now it's really fixed...
|
6.3
|
A quick little release to fix the desktop icons from flickering during reaping
sessions. It also fixes some problems which were causing some links in html
files not to be adjusted for browsing. Also in this release, there's a new
option which allows you to specify that the limits (max size, min time, etc)
are applicable to either HTML files, binary files or both. Leaving the log file
location blank will now result in no log file at all, rather than the previous
'webreaper.log' in the current directory.
|
6.2
|
It's been a while, but finally here's the next version. It's got a prettier
interface, some major bugfixes, and some new features too. I've tried to fix as
many of the bugs you've told me about as is possible, but keep 'em coming -
it's very helpful (and saves me from doing so much testing... ;-)
Here's the list of major bits which have changed:
-
Some fixes in the link parsing code
-
URL Exclusions added to options dialog.
-
Log file, with user-defined location
-
Temporary/output files should now be closed properly and deleted too.
-
Last modified date column in details pane.
-
New link processing order - inline images/objects are downloaded before other
links are followed.
-
New colours in links tree: black/pending, red/skipped, green/downloaded,
grey/excluded.
-
New icons in details pane, and redraw optimised to reduce flickering.
-
Pause/Resume download - the stop button now pauses a download (the number of
links remaining to resume is indicated in the status bar) allowing links to be
included/excluded by hand before resuming the download.
-
'Gentle mode' allows inter-object delay to save thrashing servers
(single-thread mode only).
-
Major memory leak fixed.
-
Single executable - smaller and no dependancy on the MFC42.DLL.
-
All links fixed up, even if they're below the maximum crawl depth.
|
6.1
|
Not as many new features or fixes as I'd have liked, but it was about time to
get a new version released. Lots of tweaks and little fixes of things which
weren't very polished in v6.0.
Some of the more exciting changes:
-
Shortcut for locally browsable files now named with page title
-
New option 'intelligent modem dial'. If the application seems to have
difficulty in detecting a currently active internet connection, unchecking this
option should help.
-
New option to set the user-agent string. For advanced users, this allows
WebReaper to 'spoof' as other browsers, enabling browser-tailored pages to be
reaped (although I think that this goes completely against the 'generic
browsing' nature of the internet, but hey, that's Microsoft and Netscape for
you...)
-
New option on the 'Filter' page to stop the download of inline images.
Previously, the only way to filter them was to set a size limit.
-
Command-line support for batch files should now work
I know there's a few fixes missing which I did promise, but I've been very busy
and just haven't had much time to work on WebReaper. Hopefully in a couple of
weeks' I should have more time to catch up on the massive list of requested
features.
|
6.0
|
Well, it was about time I started behaving myself and adhering to standards, and
a couple of severe telling-offs from server administrators has lead to me
finally putting in a complete and proper implementation of the 'robots.txt'
exclusion standard. There is no override for this; if a administrator doesn't
want you reaping certain parts of their server, then that's their perogative.
However, you could always mail them and ask for special permissions for the
'webreaper' user-agent. I ask that you move to this new version as soon as
possible - I've had various reports of WebReaper causing major problems by
reaping servers from which it should have been prohibited.
Please note: I have also had a few complaints about WebReaper being run
constantly on a server over a long period (6 hours, in one case). I have
decreased the maximum download time to 2 hours, but if you are planning a
'major reap' of a site (i.e. > 10-15 minutes), please contact the administrator
first to ask permission. Leaving WebReaper running for long periods can
cause major problems on servers, which could then lead to the services provided
there being withdrawn. Common courtesy never hurt anyone.
If the complaints continue to increase I may have to start distributing
WebReaper with licences so that users who abuse webservers can be identified -
this extra administration would almost certainly require me to change the
Freeware status of the software.
Also in this release is complete support for the 'If-Modified-Since' HTTP
header, which is a much more efficient method of getting files; as well as
reducing server load, it will also cut download times. This feature kicks in
whenever the 'check for changes' option is checked when reading local files,
and also if the new option is used, which allows you to limit downloads to
files which have changed in the last n days.
I've also added a thread monitor, and cut down the number of states (and
therefore the number of required refreshes) for the details/tree panes. This
should reduce the amount of work being done by the CPU, whilst still allowing
you to keep up with the progress of the threads.
Oh, and of course the usual batch of little fixes for bugs which you've all been
telling me about. ;-)
|
6.0b
|
Never one to rest on my laurels, me. This version allows the robots.txt 'ban' to
be overridden, since (as you've all told me) just about every site seems to
have a robots.txt file on it. In the end, to avoid website owners wrath, I've
left the decision to the user - with a suitable warning. However, any site
which really doesn't want WebReaper crawling it's pages can override
this (mail me for details).
This release also allows you to save files locally without the IE cache being
filled (saving hard disk space) using the new "Don't save to IE cache" option.
Other fixes: the intelligent connect stuff now works correctly if you're using a
proxy server, and running WebReaper with a URL as a command-line argument will
immediately kick off a download.
|
5.4
|
Never rains but it poors. Seems that 5.2 & 5.3 kept resolving URLs down to http:///,
which doesn't work very well. I promise that this version works. Also
the tooltips in the main details pane stopped working, so I've fixed them too.
|
5.3
|
Ahem. Well. Errm. So, let's just forget about v5.2, shall we? Not the most
successful of releases - the bookmark bug was still present (just slightly
different) and the 'intelligent modem connect/disconnect' feature was actually
pretty dumb - it would disconnect the modem after each file was downloaded, and
then redial for the next one... Doh!
Anyway, things should be much better now. ;-)
|
5.2
|
-
Fix for the bug where the bookmarks ('#') aren't processed correctly
-
Fixed a nasty memory leak.
-
Robots.txt and support to stop webreaper crawling sites. Also, a page
will not have it's links followed if the a meta-tag is present with the name
set to 'robots' and the content set to 'nofollow'.
-
Dockbars for the batch and log windows.
-
Better link parsing - including bookmarks and BASE tag support.
-
Clean up temp (~.tmp) files during downloads.
-
Redesigned documentation (thanks to Carl Osterly)
-
Intelligent connection/disconnection before and after sessions.
-
Automatic software updates - WebReaper will check this website for new versions
Plus a few other minor fixes.
|
5.1
|
Minor release - fixes a slight parsing problem to do with bookmarks.
|
5.0
|
Major new release - the one you've all been waiting for! New features include:
-
HTML links in locally saved pages can now be adjusted for local browsing.
Websites saved locally should now be browsable with any browser (Netscape, IE,
etc.).
-
Lots of HTML parsing fixes - most pages should work now.
-
Proxy/Password support. Enter a username and password on the Security tab in
the Options dialog. Also, option to be prompted if the current password doesn't
work.
-
Newly designed simple screen for Min/Max limit options.
-
Details list sortable by clicking on headers - toggles through ascending,
descending & unsorted.
-
URL Profiles - save a complete set of options associated with a particular URL
for ease of 're-reaping' at a later date.
-
Strip HTML Tags - save text-based pages with their HTML tags removed.
Plus lots of other fixes. Note that some of these features haven't been
completely tested (for example, I can't test the proxy authentication code as I
don't have access to a password-protected proxy server!), so let me know if
there's problems by mailing me at the usual address.
Also note that the new documentation has been 'thrown together', but is worth a
read. I've taken out the screen shots (hence the smaller EXE to download) since
a) they were pointless and b) I haven't had time to update them. A completely
new (jazzed up) set of documentation is currently in production...
|
4.0
|
'Resume' mode added. When the "don't download files which exist locally"
checkbox is checked in the options, WebReaper used to just skip/ignore any
files for which there was an equivalent on the local machine (created using the
'save' options). Now WebReaper will actually read the local file, looking for
links and downloading them. Because the local files are ready extremely quickly
but files which were not previously downloaded are read from the internet, this
mode can now be used as a 'resume' feature, effectively continuing a download
from where it was previously terminated.
Note that the resources on the internet are checked and if they are more recent
than those on the local machine they will be downloaded regardless; this means
that the local files will always be as up-to-date as possible.
I have also added the facility to exclude/include URLs from the download as it
happens. During a download, right-click on the URL tree to show a menu with
'Include URL' on it. If the URL is to be downloaded then this will be shown
with a 'tick' next to it. By selecting the menu item you can toggle the status
of the URL, including/excluding it as required. Note that when a URL is
included/excluded, this setting is recursive - that is, all 'child' links will
be changed to the same value. This allows entire branches of URLs to be
included/excluded.
|
3.7
|
Found out that the links in HTML tags could use single quotes and not just
double ones, which was stopping WebReaper following links on some sites. Also,
where the webpage size isn't available, the list display shows the amount of
data downloaded (instead of the usual percentage read).
Documentation fixed to reference pictures correctly (FrontPage Express stitched
me up and put references to "Program Files" in there, so that if you installed
the app anywhere else the pictures would fail to load). Thanks Microsoft.
|
3.6
|
Fixed the bug where the maximum download time was completely ignored and the app
kept downloading regardless. When files are saved to the local hard disk, the
server name is saved as part of the pathname to avoid file clashes. It was
reported that the files were being left locked whilst webreaper was working,
stopping them from being previewed until the download was complete; this should
be fixed now.
Unfortunately I can't seem to fix the IE4 dependency, so it looks like Microsoft
wins again... you'll need at least a minimum install of IE4 for WebReaper to
work... :-(
|
3.5
|
Fixed the bug where all URLs were converted to lower case, causing some files on
unix servers to fail their download when their URLs contained upper-case
letters.
|
3.4
|
Fixed a bug on Windows 95 where the files being saved locally weren't renamed
correctly (they're created with a "~.tmp" extension and then renamed once the
file download has completed - except they weren't on Win95 due to me using an
NT-only rename function. Doh!). The files are now saved with the same date and
time as the original source file. This allows WebReaper to skip files which
haven't changed since the last download when using the "Don't download existing
files" flag.
|
3.3
|
Fixed a bug which meant that garbage appeared in files when they were saved to
the local machine (using the Save tab). Also fixed some html parsing problems
(found some links in a format I'd not come across before). Removed the password
option - it doesn't work (more thought required). Added Save All and Download
All checkboxes to override the type-selection lists. The file-extension filter
makes a return due to popular demand, allowing files to be filtered on both
file extension and file type. Changed type lists so that a context menu
provides access to 'check all', 'uncheck all' and other related options - this
is also supported on the Save tab. Offline mode now affects the global system;
switching this mode in WebReaper will affect Internet Explorer, and vice versa.
Application now distributed as a single executable (a self-extracting
Installation program).
|
3.2
|
Fixed timeout problems, and added 'skipped' status for ignored files. New
feature on 'General' options tab allows you to enter a username and password
which will be used for any links which need to be authorised. Note that a) the
password is stored in the registry, so clear any really secure ones
before closing the application and b) I don't know if that feature works, as I
don't have any username/password protected sites. Feedback please (whether it
works or not).
Oh, and cut and paste now works in the address combo... :-)
|
3.1
|
More work on multi-threading (to fix timeout problems on large files with higher
numbers of threads). Files which fail due to timeouts will now be retried - up
to five times.
Improvements to the download filters: internet resources can now be
downloaded/filtered using their actual object type (rather than just file
extensions). New (previously unseen) resource types are added to the list as
they are discovered. The Save options tab also uses this method to allow only
particular types of file to be saved.
Cut & Paste to the address combo still doesn't work.... :-(
|
3.0
|
Restructured to use multiple threads for downloads, which should substantially
improve performance.
|
2.5
|
A few minor fixes. New look toolbar (if you have IE4 installed). First version
of documentation included.
|
2.4
|
Fixed major bug where the system image list (which provides the icons for files
and programs in the Win95 Explorer) is damaged, causing all icons to disappear.
|
2.3
|
Added drag/drop support. Links can now be dragged
to/from WebReaper and your browser.
|
2.2
|
Fixed a few more bugs, converted options into 'tabbed'
property page.
|
2.1
|
Fixed crash when return is pressed from within the
Address toolbar list. Links tree now only shows 'parent' links (i.e., links to
pages that have sub-links). The details and tree view use the system file type
icons. Added option to only download links in subdirectories of the root. New
'offline' mode which, combined with the ability to open links by
double-clicking on them in the details pane, allows downloaded links to be
browsed.
|
2.0
|
Converted 'address' field to toolbar, rearranged view
panes. Improved link parsing (certain URLs were being ignored). Added better
limit configuration. Also added the ability to save downloaded binary files on
disk.
|
1.5
|
First public release.
|