Comic Downloader v1.6.6 by Nyerguds Build date: 01/02/2017 20:31:22.72 This tool is created for archiving webcomics. It is primarily based on the use of regular expressions. Knowledge of regular expressions is obviously required to understand how to use it. This tool was created purely for my personal use, and I can't be held responsible for anything that happens as a result of other people using it. ================ Options overview: ================ Comic info: ---------- -Name: If saved to the comics list, this will be the name of the entry. Re-saving a comic after a rename will create a new entry on the list. This is also what will be used as folder name if the option "Add folder for comic name" is checked. -Start url: The url for the start of the comics archive. -Next page regex: The regular expression to find the "next page" url in the html page source. -Next page regex group: The capture group in the "next page" regex to extract the url from. -Image url regex: The regular expression to find the comic url in the html page source. -Image url regex group: The capture group in the image url regex to extract the url from. NOTE: The urls can be absolute or relative. All types of relative urls are supported, and will be resolved as relative to the url of the web page. -Download all matches: Enable this if one page can contain multiple images, either through repeated matches or repeated capture groups inside one match. -Start url is an index page: [BETA FEATURE - will not always work correctly] All of the comic's pages are linked from the given page. This will download all pages in the list one by one using the "next page" regex, and download the image(s) from each page. File name methods: -From final image url: Use the same filename as the one in the found image url. -From image url regex: Capture a different group from the image url regex. -Numeric by fetched page: Simply give the images a 5-digit number as file name (like "00001.png"). The start number can be given in the numeric control next to it. -From new regex, on page (single): Uses a single string found on the html page as file name. -From new regex, on page (following image url matches): Uses a regex that repeats with the image url regex, and follows the same capture group numbers and capture group repetitions as the image url one. -From new regex, on page url: Uses a single string extracted from the page url as file name. NOTE: If there are multiple images per page, but the selected filename method only gives one name per page (numeric / page regex single / page url regex), an alphabet letter suffix will be appended behind a dash (like "00001-a.png"). -Add detected extension to the end of the resulting filename Explicitly adds an extension to the file, based ont he detected file type. Note that this is always done for file names that contain no "." character. -Add page number prefix to all images Combines the "Numeric by fetched page" name with the otherwise created name, in the format "00001-imagename.png". -Force repeat suffix on all images If repetition is enabled, alphabetic repeat suffixes are normally only added when multiple matches actually are found. With this option, a single image found on a page will also get the "-a" suffix. Download info: ------------- -Destination folder: The folder in which to download your comics. This is a global setting, not a setting per comic. -Add folder for comic name: If you don't want to change the destination folder for each download, set the destination to a general comics folder and check this option to automatically create a folder for each new comic you save. Probably the easiest way. -Limit download retries: Normally, 503 (Gateway Timeout) and 504 (Service Unavailable) errors will result in retries. This option sets the maximum amount of retries before the comic download progress will abort. Normally, on slightly-less responsive servers, 10 is a decent value for this. If the option is disabled the program will retry -Start: Start the download process. -Stop: Stop the download process. This will offer a prompt to preserve your progress by adapting the start URL and the start number. -Test: Perform a test run without downloads. This will show you which text is captured by your regular expressions, and what the final image url, file name and "next page" url will be. It is strongly advised to thoroughly test your regexes this way before starting the download process. ================ Version history: ================ v1.6.6 -Cancelling the progress will now also abort between images on one page. v1.6.5 -Added detection of tag to indicate url base. -Added button to clear the log. v1.6.4 -Upgraded to .net framework 4.5 -Fixed https downloads from updated servers v1.6.3 -Fixed the fact the "skip existing files" option wasn't saved to the ini file. -Changed the default included mangafox template to include the extension. -Added automatic continuation as option. v1.6.2 -An aborted download will now correctly suggest the current page instead of the next one if not all images from that page were downloaded. -Index mode will no longer abort on a failed sub-page fetch, but will just log the error and continue with the rest of the list. -The input url now accepts "https" schema. v1.6.1 -Fixed a minor issue with shortcuts that still worked wile the UI is disabled. -Fixed a bug where the name pattern was cleared if a download was cancelled. -Added buttons to insert regex templates into text boxes. -Added button to open the download folder. This button is never disabled. v1.6 -Slightly optimized the new UI. -Fixed some errors in the disabling of components when changing options. -Improved image file name cleanup, decoding HTML symbols and replacing quotes. -Exporting comic settings to a file now suggests the comic name as file name. -Added extended rename pattern for files. Not sure if it's possible to implement this for multiple image matches, though. -Added support for question marks in filenames (thanks to the Ménage à 3 comic) -Fixed handling of "connection closed" errors -Fixed the fact the Ctrl + E shortcut didn't work on disabled text boxes -Added the ability to link to an index of comic pages. -Added cookies support for websites. They need to be added manually in the ini for now though. The format is Cookies=name|value|path|domain;name|value|path|domain;name|value|path|domain v1.5 -The interface layout now uses two columns, to allow adding more options. -Added a log viewer. Click the status laber at the bottom to open it. -Fixed the use of Ctrl+A in read-only and line-wrapping text boxes. -Added a system for retry attempts when timeouts occur. v1.4 -Fixed a small layout resize problem. -Fixed a bug where urls ending on a dot could not be retrieved. -Fixed a bug where a filename based on a regex fetch could crash the program. -Allowed prefix page numbers to start from 0 -Aborting the download progress will now ask to to replace the start parameters in the tool with the current download position. -The "Stop" button will no longer be enabled at program start. -Fixed a bug where message boxes were unselectable if they were shown while the program was minimized. v1.3 -Added support for relative urls starting with a question mark. -Fixed a crash that happened when the comic name contained illegal characters for creating a folder of that name. -URLs will no longer be missing from the status bar if they are too long. -Made sure the regex text boxes can't contain line breaks; it polluted the ini. -Fixed errors occurring when the program's ini file is edited with Notepad. v1.2.3 -Fixed url encoded characters in the resulting filenames. v1.2.2 -Fixed html characters in urls not gettin converted. -Improved some of the menu item names and shortcuts. v1.2.1 -Fixed deleting of items from the internal list. v1.2 -Added loading and saving to the internal list. This list is stored in the program's ini configuration file. -Made the a/b/c suffix an option on the UI v1.1 -Added six different methods of getting the filename for the final file. v1.0 -Unreleased first version. Literally just accepted two regex patterns and their group number to download the comics, with automatic support for repeated image url groups.