What's new in this version:
- Fixes a possible crash when using 'highlight' feature from within Link Inspector
- Updates the list of user-agent strings so that they're current versions
- Important fix for all users. Using 'Limit crawl based on robots.txt' could cause the scan to stall if the user is blocking AI training bots. This is because Integrity was misinterpreting "User-agent: Google-Extended" (Bard) as one of the web indexing Google bots.
- Fixes a problem with the link inspector not displaying all instances of the link url if it appears multiple times on a particular page
- Now generates a warning if a link that appears to be internal (from its domain) but is marked rel=external. This may be deliberate and is legal but there are serious SEO implications if it happens unintentionally.
- Despite collecting the data and possibly generating a warning, Integrity does not *observe* the rel=external at this point. ie if it is internal according to its domain, integrity does not mark it as an external link or treat it as external, ie not following it. This does raise some questions and the matter is left open at this point.
- Now observes rel=nofollow in a link, if at the point the engine comes to follow that link, *all instances discovered so far* are marked 'rel=nofollow'. In other words if an instance has been discovered at that point which doesn't have the nofollow keyword in its rel, then the link will be followed.
- Fixes a possible issue with character encoding
- More clearly marks image tags with empty source. Previously 'flag missing link url' would also pick up img tags with missing src, if image check was switched on, but would confusingly mark them with the status "missing link url". Now the images are given the status "missing image src" so that they can be distinguished from missing link urls in the By Status and other views.
- Adds Expand All and Collapse All to View menu with keyboard shortcuts, and buttons for those functions to the toolbar palette. These will expand / collapse all items in the current view, if the view is expandable, eg by page, by status, Spelling by word.
- Fixes bug that may have prevented pdfs from correctly being included in the sitemap if the preference was checked or caused them to be included even if the preference was unchecked. Now this behaves as expected according to the preference.
- Two settings were not being saved. Rules > Consider http and https versions of the same url to be the same and Rules > Consider http pages external
- When scanning locally (file://) and with the 'test anchors' setting switched on, could go into a loop
- Fixes the issues with sorting some of the columns of the 'All links' table
- (Integrity Pro) Fixes 'list' view of sitemap visualisation not appearing correctly when system in light mode
- (Integrity Pro) Adds checkmark to the 'sort by' popup button above list of websites
- Small fix for a problem that will affect very few users, but will be a very important fix for those users. If the starting url has a path, and that path contains encoded characters (eg %20) and relative links on the page, then the crawl may not have proceeded past the first page.
- Adds 'redirect count' as an optional column to the All links table. Sorting this table will find links with the longest redirect chains. This appeared with the SEO results in version 10 and earlier. However, redirect count isn't a property of pages but link urls (different link urls may arrive at the same page via different numbers of redirects.)
- (not free Integrity) At the end of a scan, if user hasn't added a name for the configuration, "A new website" is automatically replaced with the first word or two from the starting url's meta title.
- Fixes a problem that could cause repetition or even a loop with certain urls that contain percent-encoded characters
- Adds advanced option to render starting url in webview (now performed on editing / adding starting url) This can help to pass certain DDOS checks
- Fixes an issue causing some missing anchors to not be reported
- When testing anchors, if the anchor fragment of a url isn't found on the target page, link is now coloured red in the views and treated as a bad link, rather than being coloured orange as before.
- (Plus and Pro)Reinstates 'Export Image XML Sitemap' which was a feature of v10 but has been unintentionally missing from v12 to date
- (Plus and Pro)Tidies up the image checking functionality internally. Images must be included in the scan if images are to be included in the sitemap xml or an image sitemap, so image checking is forced on if either of these settings are enabled. To disable image checking (for efficiency with certain operations) then all of these settings need to be switched off
- (Pro)Fixes Description Length column of SEO table showing incorrect value
- (Plus and Pro)Fixes empty lastmod tags appearing in sitemaps where no date is available. Tags are now correctly left out if no date is available (optional tag). NB Google says, if no date is available, don't insert date of sitemap generation
- Some fixes to the 'check anchors' functionality. Note that the status will be the http status regardless of whether the anchor is good or bad, ie '200 no error' even if the anchor doesn't appear on the target page. In the case of a problem with an anchor, a warning is created on that url which can be highlighted in orange depending on your Preferences, or shown in the warnings table in the case of Integrity Pro and Scrutiny.
Fixes the option Preferences>Views>Display labels>'Display labels for redirected URLs':
- An efficiency improvement which may help with a slowing-down problem experienced by some users and will generally make the app more efficient for medium to large sites
- Fixes double-quotes in url (even if percent-encoded) corrupting bad links (by link) csv
- Fixes crash which happened under unlikely circumstances
- Fixes a bug that could have caused spurious statuses for certain urls (caused by containing a percent symbol which isn't part of a percent-encoding, which is a bad practice anyway.)
- Fixes some urls with unlikely character combinations in the querystring or path to misleadingly display in tables as the domain only
- Some social and calendar 'add this' links are now listed but not checked. It's potentially not a good thing to request these urls to check them. They are major domains (eg Google, Outlook) and tend to be generated by plugins, and so more likely to give a false positive when tested than to actually be broken.
- Better handling of html5 picture tag
- (Plus and Pro) Preferences for Sitemap (whether to include unique image urls, audio, video, pdf) are now all switched on by default
- In the 'by link url' table, the starting url may have shown 'appears 0 times' in the On page column if the exact url didn't appear as a link (eg if you start at mysite.com but all 'home' links point to /index.html). Technically true but inelegant. Now shows "starting url"
- (Plus and Pro) Restores 'Check for updates' under Application's main menu. If enabled, an auto update check will be performed on startup and display a button in the main window. The auto update can be switched from within the manual update check window.
- a problem that led to the final status of certain urls (a redirect followed by an error) showing the 3xx redirect code rather than the final error code
- bug with the redirect table in the Link Inspector not showing the starting url on the first row but the first redirect url
- a problem that could (under fairly unlikely circumstances) lead to a page incorrectly being excluded from the sitemap for being marked 'robots noindex'
- Improvements to parsing srcsets
(Pro) Improves structured data functionality:
- Now allows for multiple chunks of json-ld on a page
- Now reads multiple items within @graph properly
- Displays the top-level @types from json-ld in the SEO->Meta data table
- Preferences>Views>'treat blacklisted urls as bad links'
- appearance of headings 'outline' view in page inspector
- Other small fixes
- (Plus and Pro) Adds 'Delete configuration' to context menu
- sorting by column in 'by status' view, plus small efficiency related to building by status view at end of scan
- Fixes an issue that could have caused image urls after an audio or video tag to be corrupted and therefore test bad since 12.4.2
- Fixes relating to the 'on finish alert' preference
- (Pro) Fixes an issue which could sometimes cause scan to slow down and stop when using the rendering feature
- Updates to the manual
- Improvement to 'robots noindex' search, false positives may have been seen for that
- fixes a problem that prevented the contents (src) of iFrames from being followed if the starting page consists of iframes and no other links
- Adds support for links to audio and video files within but both markups are valid and now correctly parsed, tested and reported
- (Plus and Pro) Adds 'Visit' to context (right-click) menu of website list
- Fixes problem causing css files to not be parsed for url('') images. (These image urls should be tested and reported if 'linked files' and 'images' are switched on in Options.)
- Fixes problem which caused starting urls with accented characters in the domain to stall
- Urls with special characters may have appeared encoded rather than decoded in one or two places including the Link Inspector 'appears on' table and redirect table
Improvements to soft 404 functionality:
- certain social networking sites currently return a 'soft 404' which is a 200 code and a page that says "page not found" or similar. Detecting this requires the soft 404 feature to be switched on and configured. in some cases it can also require clientside rendering. External links to Twitter and Youtube are now automatically rendered in order to access and check page content (if soft404 is switched on)
- Some terms are added to the default value for the soft 404 terms field. This will only affect new users or those who haven't altered the default list.
- When a possible soft 404 is detected, a warning is created which explains which term was matched on the target page. This can be seen in the Link Inspector and in the Warnings table.
- Adds context help to soft 404 settings in the Preferences window
- Improves sorting in Link Inspector 'Appears on' table
- Adds 'Copy URL' to the context menu for that table
- Fixes a problem preventing the 'bad links only' button from working properly in 'by page' and 'by status' views (Integrity Pro and Scrutiny only)
- When Pause is pressed during scan, current connections are now stopped and returned to the check queue. Previously they were allowed to finish, which didn't cause a problem, but as the stats crept after pressing the button, it didn't look particularly 'clean'.
- By popular request, adds 'live updating' to the 'By link URL' table (and only if that tab is selected)
- it is best if the Settings or another tab is selected during the scan, as the live updating is an overhead and makes the san slower and less efficient. To help with this problem, the 'live updating' happens periodically rather than with every url. But this still isn't advised for very large sites.
- (Integrity Pro) Fixes message on Spelling tab 'Spelling is disabled in settings' being displayed permanently
- Significantly improved and more efficient parsing for meta http refresh. Now checks for delay in seconds within the content attribute, if small (<6s) will observe the redirect, otherwise will ignore
- Now ignores meta http refresh found withintags
- Should make the crawl a little more efficient too, as the check for meta http refresh is made for every page parsed
- Important update for Japanese users: - better detection and proper implementation of EUC-JP character encoding
- Implements Page Inspector which opens rather than the Link Inspector where appropriate (eg SEO and Sitemap tables, top level of 'by page' view, or from within Link Inspector if the link url's target is a page.)
- Adds 'detailed diagnostic window for starting url', available under View menu
- Correctly handles response header field 'Refresh:', performing the refresh if the number of seconds is < 30
- Handles http redirect where the redirect url is empty
- Handles http redirect to a mailto: tel: etc
- Adds experimental preference for the Connection: header field (keep-alive or close).
- (Integrity Pro) Adds sitemap visualisation functionality
- Adds number of pages scanned to the main scanning status bar
- Small fix to soft404 check, it had been necessary to leave the 'terms' field in order to save any changes. Now simply typing is enough.
- Minor fix to the linksLimit field
- Fixes to timestamp functionality - now the request date/time shows in the Link Inspector
- If 'bad links only' was selected when application quit, on restart the application would be filtering 'bad links only' but the button would work in reverse, ie show all when depressed. Now fixed.
- Adds ability to fully edit items in rules table (double-click to edit)
- was incorrectly disabling the meta data option if the querystring option was switched on. Now fixed.
- Fixes problem causing some links to be reported with no status, also resulting in "X of Y links checked" where X is less than Y when finished. This could have happened if Integrity first receives a link to a page which is different to its canonical url and happens to try to crawl that page before discovering that canonical url elsewhere.
- fixes data: urls within inline styles from being reported
- (Integrity Pro) adds 'Deep content' to the list of SEO filters (Threshold can be set in Preferences, default is 6 clicks from home.)
- Fixes couple of problems with sitemap rules for setting priority/change frequency
- Fixes a problem which may have caused extra urls to appear in the sitemap / SEO table. This is if internal links on the site redirect to a different URL (which isn't ideal anyway) the original link url was being added to sitemap rather than the final destination url.
- Adds context menu to SEO table view with some useful functions
- switches off some debug messages which would have affected performance of crawl when archive feature switched on
- cleans up some warnings. some data: image srcs were being incorrectly included in the warnings, marked as having no alt text. Also cleans up the formatting, a spurious number was seen following the line number.
- Fixes sorting by status in link URL, By Page and All Links views
- Fixes problem some have experienced with larger sites: "The operation couldn't be completed. (NSPOSIXErrorDomain error 24 - Too many open files)"
- Fixes 'soft 404' check
- Adds an option to soft 404 check, allows you to limit the check to internal pages only (for the best results if set up properly), external only (can produce many false positives and false negatives due to the nature of soft 404s) or both.
- Fixes bug, related to recurring redirect, that could have caused a hang or crash
- Recurring redirects now correctly show as bad links
- Fixes bug that wouldn't have caused a noticeable problem but because it caused external urls to fully load unnecessarily, the crawl may now be noticeably faster
- Defines 'recurring redirect' to be more than 12 redirects. (Greater than 3 already signalled 'redirect chain' in SEO results.)
- Fixes bug which could cause 'running on' after crawl has apparently finished. Or in some cases a crash at that point. (related to meta refresh repeatedly redirecting)
- bug that could cause pages to not be fully scanned under certain circumstances
- 'empty quote' / placeholder flagging
- problems in 'by page' view after clicking header to sort by url
- Fix that prevents a possible (but very unlikely) crash
- Better handling of scan finishing, less chance of appearing to stick near or at the end of the scan
- Handles HTTP Basic authentication protection space (as defined in RFC7617)
- Alters default setting for request header field 'Connection' from 'close' to 'keep-alive'. Examples seen where 'close' causes 'Network connection lost' status.
- Other small improvements
- Fixes crash which may have been experienced when exporting links and choosing 'by link'
- Improves certificate authentication for client-certificate protection space - previously the scan may have appeared to hang at the end because of this
- Corrects an issue with the timeout field, which may have been incorrectly set to a low value when creating a new website config, perhaps resulting in some unexpected timeouts
- Slight change in behaviour. After pausing, views are populated, so partial crawl can be examined. Or work can start on a bit of fixing after a pause/continue.
- 'Stop at X links' and 'Crawl maximum X clicks from home' are moved from Preferences to site-specific settings (Main window, Rules tab). This is also a fix as these settings weren't working properly in v12.0. This is a useful way to limit the crawl if it's not possible to limit a crawl by blacklisting.
- Change to the robots.txt policy. With 'limit crawl based on robots.txt' switched on, if there is a conflict, ie the same url is allowed and disallowed, then 'disallow' overrides.
- Adds an information box which is triggered if the scan stalls at the first url
- If main window is closed, but application remains open, a click on the dock icon re-opens the main window
- Corrects small glitch with the SEO table, preventing full display of the bottom row
- Fixes 'mark as fixed' and 'recheck this url' within link inspector window. Also fixes some possible refreshing issues if those actions used from context menus
- Fixes bug causing "ignore" rules to not be saved
- Adds "File>Return settings to default" menu option. Simple but useful, particularly with the free Integrity (which doesn't allow you to create new/multiple website configs) and particularly useful in support situations (Most support issues have to do with a setting that needs to be changed, or more usually that has been changed that doesn't need to be).
- Version 12 becomes the general release. Phased, beginning with Integrity Pro, web download.
- Fixes the Locate function which appears in the Links context menus (with one instance selected). This is a powerful and useful feature, often overlooked.
- Other small updates related to Cloudflare and Incapsula blocking
- Small but important fix. If a self-closing style tag found on the page (unlikely but valid) parser would ignore the rest of the page.
- Enhancement to starting with a list of links. It's been possible to make a list of urls to different domains in order to scan multiple sites in one scan / one set of results. Now the 'down but not up' rule is applied to urls in that list, so it's possible to selectively crawl sections of a single site.
- (It is also possible to do this by setting up 'whitelist' rules, but this relies on there being links on your starting url to the areas that you want to scan.)
- Note that when using the list of deep links, the trailing slash is important. A url such as peacockmedia.software/mac/scrutiny will be assumed to be a page called scrutiny and the crawl will be limited to /mac/. But a url such as peacockmedia.software/mac/scrutiny/ is assumed to be a directory and the scan will be limited to /scrutiny/ Improvements to the parsing of image srcsets
- Integrity and Integrity Plus were incorrectly showing File > Export > Warnings, which would fail with a warning bell if chosen
- Integrity Pro would fail to export Warnings or Spellings if File > Export was used and the table in question had not been accessed in the UI first
- Some fixes to the config functionality, in particular, deleting the last config in the list, which would cause unexpected behaviour
- Now handles cookies by default. It's becoming more important and the original reason for having it off by default is less of an issue now.
- For each request, the request header field Cache-Control is now set to no-cache (rather than max-age=0) which may be the better way to force a fresh version of the page
- Important update for users of version 10.4.2+, If head tags exist but not body tags (which we believe is fine as both are now optional) Integrity would fail to find links on the page
- Important update for users of 10.4.1. fixes issue with the link inspector, not visiting / highlighting / locating the correct / selected page
- Fixes issue where multiple sections would prevent proper parsing of some of the information in the head and could lead to incorrect warnings of missing title or missing description
- very minor tweaks to the server request header fields which are sent with every request
- if image url is empty, alt text warning says "empty" for image url rather than being blank
- NB Integrity and Scrutiny support pages with multiple head sections (with warning), no head tags, no head or body tags.
- Minor correction with one of the warnings - p within heading - the warning said that p can only contain inline content (which is true) but in this case it should have said heading tags can only contain inline content.
- Fixes problem - multiple instances of the same page may have appeared in the 'Appears on' list in the link inspector, if the anchor features was turned on
- (Pro) Adds ability to see context for warnings. Sometimes it can be difficult to find the problem in a page, even given a line number. A double-click on a warning in the warnings table will open an inspector which will usually show a clip from the page source in the area of the problem.
- (Pro) Adds Headings table to SEO results (headings are still available in the main view)
- Interface changes, for user-friendliness - in Links results, 'By Link' is now 'Link URLs' and 'Flat view' is 'All Links'
- Very minor improvements to content-type / file type detection
- (Pro) A few small fixes to the default values of the SEO results tables
- (Pro) Fixes formatting of data in SEO headings columns (were unnecessarily padded with tabs)
- Now always sends the Accept-Language header with the default value of '*' for all requests
- Adds field to Preferences for user to add a custom value for the Accept-Language header, in order to control which language is selected, for websites which use this header to select which localisation of the website is served.
- Minor improvements to content-type / file type detection
- Improvements to robots.txt detection, parsing and applying
- includes bug fix, 'disallow' terms were being incorrectly applied to externally-hosted resources
- includes bug fix, disallow was working on the whole path, rather than the root, so if /reports/ in disallow list, would apply to domain.com/xxx/reports
- Adds HTML validation warnings "more than one title tag found" and "more than one opening html tag found"
- (Integrity Pro) Improves the html validation where 'render page (run js) is switched on (not recommended unless absolutely necessary). Previously there would have been some false positives (eg 'no doctype' when a doctype is present), some warnings would have been masked, and line numbers would have been inaccurate, because the html would have been parsed after the page render. Now an additional pass for warnings is made over the pre-rendered source.
- Important fix for all users: fixes issue where crawl would stall if the starting url contains a meta refresh without a url, ie just a refresh which isn't a redirection to another url. This might have also prevented pages within the site from being crawled properly if they contained such a meta refresh, but this is likely to have gone unnoticed.
- Reduces the height of the preferences window, was previously too high for some screens
- Fixes false positives reported where an srcset has a hanging comma (which is failed by the w3c validator as "empty image-candidate string")
- The above situation is reported in the warnings. Integrity and Integrity Plus show these in the Link Inspector, Integrity Pro and Scrutiny show html warnings in a table.
Some improvements relating to images:
- Image urls within source srcset=.... were being collected and checked even when 'check images' was switched off
- Images with querystring after the file extension were not being recognised as images under certain circumstances (eg if they had a bad status, or if no mime type is included in the response header)
Some improvements to checking list of links / local files: (web download version. Not supported with MAS version due to Apple's sandboxing requirement):
- Enables files stored in certain locations outside the user directory
- Fixes problem with case sensitivity check when a file location involves a symlink
- Handles certain trackback links, no longer reports them as bad links
- Fixes problem where under unlikely circumstances, spurious character(s) find their way into an image or link url, causing bad link to be reported
- Important fix - link urls within html area map were incorrectly being marked as images, which in Plus and Pro could prevent the page from appearing in the sitemap if the image map is the first occurrence of that url that Integrity discovered.
- Adds 'trust invalid server certificate' (internal domain / subdomains only). Allows scanning of site while certificate is out of date or not yet installed properly.
- (Pro only) Fixes bug which cause pages to be incorrectly reported as having 'robots nofollow'. (occurred when another page which genuinely is robots nofollow links back to the first page).
- Fixes problem with 'flag blacklisted' option. Blacklisted url (ie 'do not check links containing...') were not being flagged properly. The option is now renamed "Treat blacklisted urls as bad links" and with that option switched on, those urls now show up when filtering 'bad links only'.
- Integrity free and Plus released as v10.2.1 for consistent numbering and in order to gain some of the general fixes and enhancements that have been released in Integrity Pro since v10. They do not contain the html validation functionality which is the major part of version 10 of these applications.
- Finds image url in , when either lazyload or look in meta tags is switched on
- As a policy, now reports but doesn't test certain urls such as xmlrpc.php and about:blank. They won't appear in Warnings, but will be listed as "not checked" so that the webmaster can see that they exist on the page. Checking these urls isn't helpful. They may exist for perfectly legitimate reasons such as part of a lazyload system or pingback system.
- Now recognises and warns about unterminated or nested link tags, which are illegal in html. Previously if this problem existed on a page, it could cause some spurious minor symptoms such as a link url being incorrectly reported as an image
- (Plus and Pro)Updates the Paddle licensing framework to the latest version which is Big Sur and M1 compatible
- Extends recent quote escaping enhancement to link exports
- Fixes bug preventing the crawl from starting after a local list of links has been opened
- Adds an efficiency which helps when scanning a very long list of links (csv, xml or txt, thousands of links). Previously it might have appeared that Integrity would hang for some time before the scan started running
- Fixes possible crash which may have happened at any point in a scan for certain websites since v9.10.0
- Small fix with parsing the robots.txt file
- Always parses robots.txt if present and for each url, notes whether the url is allowed or disallowed. If disallowed, a note is made in the url's warnings (warnings are highlighted in orange in the links tables, the actual warnings can be seen in the link inspector)
- Adds 'limit crawl based on robots.txt' setting. (Plus and Pro) Whether disallowed pages are included in the sitemap is decided by Preferences>Sitemap>Observe robots.txt
- (Pro )adds 'Disallowed by robots.txt' choice in filter button in SEO
- (Pro )adds 'Multiple H1' and 'No H1' choices in filter button in SEO, plus they will appear if appropriate in the short summary above the SEO table.
- Fixes problem with 'warnings' filter option in links 'by status' view
- When testing linked files, now automatically ignores the wordpress rest api files which return an unauthorised status when tested, leading to unnecessary concern
- Adds support for charset=GBK, charset=koi8-r, charset=euc-kr and some other Latin and non-Latin character encodings. For certain websites using these encodings, page titles and certain other information may have been garbled before.
- Some improvements around starting your scan with a list of links. In particular, automatically differentiating between txt and csv file types (this fixes a bug where a url containing a comma within a txt file would be incorrectly split).
- Fixes a couple of situations that could result in incorrectly-constructed link urls and therefore false positives
- Better handling of escaped forward slashes in urls
The jump in version number is for consistency with Scrutiny, although many of the changes in Scrutiny 9.8 are Scrutiny-specific (relating to insecure content checks). Integrity benefits from the following changes:
- Adds option to search certain meta tags for urls. Those urls will be link-checked and also checked to see whether they count as insecure / mixed content. The meta tags in question are meta name=, meta itemprop= and meta property=. This includes social media tags such as meta property=og:image
- Very small fix to prevent some false positives arising from SVG masks in style sheets
- (Integrity Pro) Adds 'Manage custom dictionary' button above spell-check table. This tool provides an easy way to see your list of 'learned' words (to check that you haven't 'learned' any misspelled words)and 'unlearn' any that you learned by mistake.
- Diagnosis feature: If debug console verbosity is switched to 'ridiculous', the html received from the starting url is printed to the debug console.
- Fixes blacklist rules table sometimes not clearing when user creates a new website config
- Adds 'Links in' and 'Links out' tables to the page inspector (accessed via the 'target page' tab of the link inspector)
- Fixes relative links being constructed incorrectly where page being parsed is a directory url
- In the warnings tab of the link inspector, if there was a warning about a redirection, it may have contained the final url twice instead of the original url and final url
- Change log not available for this version
- Important release for all users. Eliminates some spurious 'bad links' by correctly ignoring which often doesn't contain a full resource url and can return a bad or unexpected status when tested.
- Adds new columns rel = sponsored and rel = ugc to 'by status' and 'by page' views
- Adds sortable columns to links views and link inspector for rel = sponsored and rel = ugc. These columns are hidden by default but can be shown using the 'columns' selector above each of those views
- With the new 'check anchors' switched on, urls with #anchor fragments were sometimes incorrectly appearing in the Sitemap and SEO tables
- Fixes urls being duplicated in Sitemap table under certain circumstances and settings
- Fixes bug causing redirect to not be reported if the reason for the redirect is only to add or remove a trailing slash, and 'ignore trailing slash' option is switched off
- Very important fix to the new anchor checkbox. If left on and greyed out by switching on the querystring checkbox, could cause infinite loop in the scan.
- Fixes issue with new anchor feature. If an external link contained an anchor and appeared multiple times, each instance was listed separately in the 'by link' view.
- Adds ability to test anchors. You can switch the option on using a new checkbox on Integrity's first tab.
- this will cause urls like /index.html#top and /index.html#bottom to be reported as separate links (resulting in more data) and tested separately. (more cpu and time for crawl)
- If a link url has a #fragment then Integrity will report the server response code as before (coloured red if status is bad). The anchor has no bearing on this. However, if the status is good, then Integrity makes a further check to see whether a name or id can be found on the target page matching the link fragment. If not, this is added to the link's warnings, and the link will be marked orange.
- You can view the details of the warning in the Link Inspector
- Note that the anchor check is case-sensitive. Officially anchors are case-sensitive. Some browsers may treat anchors as case-insensitive, but this doesn't mean that all browsers will and it doesn't mean that it's right.
- Note that you can't 'ignore querystrings' and also test the anchors, since the anchor fragment comes after the querystring.
- The filter button contains a new item 'Warnings' which shows only links with warnings, this will include links with anchors where the anchor (a name or an id) can't be found on the page
- As far as the filter button is concerned, 'Warnings' doesn't include redirects, even though they're both coloured orange in the interface and the Link Inspector Warnings tab does include warnings. The Filter button allows you to separate them
- The filter button option 'Redirects' will still show redirects, even if you've chosen 'do not report redirects' in Preferences.
- Typing a '#' into the search field will show links which contain a #fragment (Plus and Pro only)
- Warnings (which have been reported in the link inspector since v9.0) now cause the link to be coloured orange in the views. As some people like to work towards a clean set of results and may not consider the warnings important, the colouring of warnings can be switched off in Preferences > Links > Warnings. The 'Warnings' filter will still work when colouring of warnings is switched off in Preferences.
- garbage urls caused by a url containing a comma, or a data: image within an srcset
- fixes bug that's unlikely to have been noticed. If a url redirects and the redirect url has a # fragment, traditionally the rule is that those fragments are just trimmed. But they weren't being trimmed for redirect urls. That is now fixed, but of course the new preference to not ignore anchors is respected.
- Irons out problem causing links to be marked external if the case of the domain of a link doesn't match the starting domain. ie start at foo.com, a link to FOO.com would be incorrectly marked as external
- Fixes line number column of 'appears on' table within link inspector window
- Small fix - unquoted link hrefs with no character before the closing bracket weren't being logged properly, leading to some spurious results
- If a meta http-refresh type redirect redirects from an internal url to an external one, then the link was being left marked as an 'internal' link. It's arguable whether this type of link (which redirects from internal url to external) is an internal or external link, but it's important for certain internal processes that it's marked as external when the redirection occurs. This was happening properly for the more usual types of redirect
- Important fix for anyone who needs to export to csv, html or xml sitemap. Fixes crash which may have been experienced on recent versions of the OS after OKing file save dialog
- Better handling of situation where image urls are being checked and an image with alt text is within a regular a href link which also has some link text appearing after the image and within the link. The link is now correctly reported with the link text and the image url is correctly reported with its alt text
- Fixes a bug causing certain links in the above situation to be missed (ie where there is an image beside the link text within a link) and where the new 'lazy load' feature is switched on
- Small improvement to 'lazy loaded' image finder. Now finds video and audio urls in the source tag / data-src element
- Fixes issue that would prevent Integrity from running under certain circumstances, ie on older systems (MacOS10.13 or earlier) and where the server can serve content using Brotli compression
- Integrity users on MacOS 10.13 or earlier should download this update. It shouldn't make any difference for users 10.14 or higher
- The main tables now retain their selection when sorted, as expected
- Support button added to diagnostics window which shows if unexpectedly few results are found
- If 492 codes are encountered (too many requests) more information is given in the Link Inspector's Warnings tab. A 429 may come with a 'retry after' which Scrutiny honours. It may also provide some information in the html of the page which follows the 429 code. All of this information is sent to that link's warnings for the user to see
- Fixes a bug causing bad links to be reported incorrectly when the link contains a fragment (#something) as well as non-ascii characters in the link
- If a mobile user-agent string for a mobile browser is being used, some sites generate an 'intent://' url. Integrity no longer reports 'unsupported url' for such links
- Disables tabbing mode (View > Tab bar) which was causing confusion if accidentally switched on. (Integrity isn't document-based)
- Improvement to 'lazy loaded' image functionality. Adds Blocs to the supported systems
- Adds .webp to the list of recognised image extensions (used in various places within Integrity)
- Adds option to look for 'lazy loaded' image urls. There are various ways to implement lazy loading but Scrutiny should find them in the case of the most common implementations
- If a meta http refresh is within comments (including ) then it's now correctly ignored
- Fixes small bug that was preventing the app from running on Catalina
- Adds 'line number' to link instances (the line number of the link within the html file) - there's now a column to show this number in the 'by link' view (when urls are expanded), by status, links flat view and the table within the link inspector
- Fixes bug that was causing broken images to not be shown in links view when Filter button was set to Images. The same bug may have had other symptoms too relating to broken images (Plus and Pro)
- Fixes possible problem of some repetition in the 'columns' selector of certain tables
- Fixes problem with 'Target Page Inspector' button within Link Inspector window when the Link inspector was opened from certain views
- Fixes bug with subdomain option which could cause certain external links to be incorrectly marked as internal
- fixes links incorrectly reported broken (link is reported with extra text or another url tacked onto the end) when the href isn't terminated by quotes or a space but the end angle bracket
- adds 're-check parent page of url' to context menu in 'links by status' view
- some fixes to the rechecking functionality when called from the By Status view
- Adds detection of unclosed comment tag and unclosed script tag, these things are included in 'Warnings'. In future the number of possible things that you can be warned about will grow
- Adds Warnings into diagnostics window
- Change to the internal flow. Previously link urls were stored 'unencoded' and 're-encoded' for testing (unicode characters and reserved / unsafe ascii characters). This is fine 99.9% of the time but sometimes this can cause a problem when this unencode/re-encode cycle produces a different result form the url as it originally appeared on the page, and the server doesn't respond to the changed version. This can cause Integrity/Scrutiny to report 404 for a link which works on the page.
- Internal note: entities are still unescaped (") we consider that part of the encoding of the html page
- Link text now searched when using search box and by page view
- Redirect chains included in warnings
- Better handling of redirection from a http or https url to a tel:, mailto: etc. Does not create a warning but cancels the connection and sets the status to 'not checked'. The redirect details can be seen within the link inspector.
Redesigned lInk inspector:
- puts redirects on a separate tab rather than a pop-up window
- adds warnings tab, contains details of anything that gives this link an orange 'warning' status
- traditionally the orange 'warning' status meant redirect(s) but now can include a number of other things
- adds 'target page' tab, which shows certain target page properties and a button to access Page inspector
- adds sortable tables of inbound links and outbound links
- adds download time and mime type to page inspector
- Patches bug which could have caused the odd link url to be missed or a spurious link url reported if certain unlikely code appears in the page
- Fixes bug which was causing urls to be reported bad where they were found as the src of certain tags (iFrame, Embed, Script) and were not quoted
- Fixes some unexpected urls appearing in Link views when the search box is used
- Improvement to subdomain comparison, internal links with subdomains may have been considered external if the starting url had a non-www subdomain (This all depends on the 'consider subdomains internal' option switched on)
- Fixes fatal error if option to check linked files is switched on and if a css file doesn't answer UTF-8 encoding
- Adds context menu to table within link inspector. Contains Visit, Highlight, Locate (as per the buttons below, which work if you first select a page within the table)
- Engine now correctly ignores 'data-' elements within link tags. This was leading to some spurious results
- Further improvements to soft '404 functionality'. If target of link returns plain text rather than formatted html, Integrity now handles this. If the target page is formatted html and has a title, this is also now searched for the list of soft 404 terms.
- Further small fix for a potential problem to pattern matching (as used in site search, blacklisting soft 404 etc)
- Fixes a bug causing the crawl to stall under obscure circumstances (starting the scan at a deep url, where the deep url contains an asterisk character)
- Fixes problem of 'soft 404' search returning 'near matches'. It now searches literally for the string(s) you enter
- Corrects odd behaviour when a canonical tag appears twice on a page. This situation is handled more gracefully
- Able to pull image urls from css style sheets and check their status (if the 'check linked js and css files' option is switched on')
- (Integrity Pro) Fixes bug causing some code to appear in stripped plain text if tags have no whitespace between - this could cause spurious words to appear in the spellcheck
- Important fix, a bug could cause crash during scan in certain circumstances (though not reported many times). This was also causing some inefficiency
- Integrity, Integrity Plus and Integrity Pro are now notarized by Apple (security checked and certified). This requires that they run under 'hardened runtime' which is also a security measure
- Search box for link results is now a literal full match
- Subtle improvement to html parsing relating to comments
- Better handling of SSI where the include happens within an html tag
- Some engine improvements re extracting canonical url
- Improvement to subdomain handling. The subdomain option 'treat subdomains of starting url as internal' may have not worked as expected if the starting url had a subdomain already, including www. This option should now work as expected for starting urls that include www
- (Integrity Plus and Pro) Fixes a bug with the sitemap csv export which could cause some unexpected urls in the results (no problem with the xml or other formats)
- Fixes a couple of problems that could cause the scan to speed up above the limit set in Settings : Timeout and Delays
- Change to that Limit Requests to X per minute' setting - it had originally been set to reject anything below 30. That's now reduced to 10 as some sites are getting more difficult to scan with various ways of detecting automated requests
- Fixes bug relating to the blacklist / whitelist rule table, specifically when editing a value, and removes the option for 'Only follow' which was logically flawed and should have been removed when the 'does not contain' option was added. Users should use 'do not follow urls that don't contain' instead
- Improves iFrame support
- Fixes problem with img alt text being truncated if it contains a single quote character
- Important patch, obscure problem causing incomplete scan in unlikely circumstances
- Fixes but that may have caused crash with certain urls
- Further work around the improvement to the meta http-equiv refresh handling
- (Pro and Plus) 8.1.9 was incorrectly sandboxed, possibly resulting in website configurations not being visible for users upgrading to 8.1.9 from an earlier version and then to 8.1.10 Users should contact support for the solution in this case
- 10.14 Mojave dark-mode-ready
- Fixes 'next bad link' button in link inspector
- Fixes a bug which would have caused Integrity to stall at the first url (reporting that as a 200 but going no further) under an unlikely set of circumstances
- 10.14 Mojave dark-mode-ready
- Different handling of a common issue: linkedIn urls returning a 999 code (even though the link may work in a browser). This is not an Integrity issue but common to all webcrawlers / testers. LI seems to detect the rapid requests and/or non-browser querystring and returns a non-standard 999 code. Integrity used to present this as a server error and count it as a bad link. Now it labels it as a warning, and does not count it as a bad link. This is because it is not necessarily a bad link, it just hasn't been possible to test it properly.
- Fixes issue with meta http-refresh not being observed if the page contains content with links. (The content was being parsed for links, in favour of the redirection being observed.)
- (Pro) (Build 8.1.81) Fixes bug causing no data to show when 'duplicate descriptions' is selected in SEO Filter button
- Fixes bug which may have been responsible for some unexpected results for some users
- Enables dark mode when using MacOs 10.14 Mojave (will respect the user's choice of dark or light mode in System Preferences)
- (Pro) Enables keyword density functionality in SEO table (keyword stuffed pages)
- Better handling of a recurring 'Refresh' header field which could have appeared to leave the scan hanging when almost 100% finished
- Some improvements to the sorting and filtering which should prevent a short hang when using the 'bad links only' checkbox in the links results. There may still be a bit of a delay with some large sites and when the 'by status' tab is selected.
- Fixes Problem with 'Images' option in filter button which was showing some urls which weren't images
- Fixes problem with headings / outline in page inspector (accessed from 'by page' view and double-clicking on a page rather than a link)
- Other small fixes
- Fixes problem scanning a site locally and directory path contains a space or certain other characters
- Adds override for the built-in behaviour which excludes pages from the sitemap if they are marked robots noindex or have a canonical pointing to another page. These options are in Preferences > Sitemap, they should be on by default and should only be switched off in rare cases where it really is necessary, such as using the sitemap for a purpose other than submission to search engines (where you do want all internal pages in the file)
- Updates links within the app and dmg (support, EULA etc) to new https equivalents
- Fix to Links/By Link table which was not remembering its column information
- Adds support for tag
- Adds detection of audio and video mime types. The filter button in Integrity Plus and Pro allows you to see audio urls / video urls
- (Pro and Plus) Adds the options to include video in the xml sitemap
- Fixes case where a set of circumstances could cause the scan to appear to finish early (and error shown for first url) while scan actually continues
- (Integrity Pro) Adds some options for spell-checking: to ignore contents of and / , to only check contents of & and to check contents of image alt text
- Note that the option to check spelling within nav, header and footer is off by default
- Fixes Preferences > Links > Do not report redirects
- Further measures to reduce 'false positives' (which is a key v8 feature). In this case, 403 (forbidden), may be returned if useragent string is Googlebot or not a browser. Where a 403 is received, and the user has useragent string set to Googlebot or Scrutiny, then the url is retried once, with cookies, GET method and useragent string set to that of a regular browser
- Doubles the alt text buffer, alt texts of more than 1,000 characters were regularly being seen
- Fixes Preferences > Links > Do not report redirects which has not been working properly in v8
- When user marks a link as fixed, the redirect information for that link is now correctly cleared
- Now correctly handles a link where href = './'
- Allows for longer srcsets (>1000 characters). Previously, truncated urls may have been reported due to a buffer limit
- Fixes sorting in Spelling / by page table
- Adds context menu to sitemap table (copy url / visit url)
- Fixes problem with context menu in SEO / meta data table, 'copy url' or 'visit url' could work on wrong url
- Adds context menu to spelling / by word table (copy url / visit url)
- Adds option to spelling / by word table to 'remove without learning'
- Adds column 'og:locality' to SEO / meta data table
- Fixes bug causing spurious results to appear in the links tables sometimes when using the search box
- (Integrity Pro) enables toolbar 'get info' button for Spelling view
- (Integrity Pro, not MAS) implements update check
- 'Don't follow nofollow links' could prevent crawl from getting off the ground
- Fixes problem in the sorting of Sitemap by 'priority' if any rules are in play
- Fixes bug preventing sitems 'priority' column from being manually edited if the sitemap rules table is empty, and bug preventing the 'change frequency' column from being edited manually
- enables 'double click to preview' in SEO / Images table
- Fixes problem where unlikely set of circs could cause crash (certain unintended spurious character included in the link target url, a specific page encoding)
- Fixes bug that prevented full scanning if port number used in the starting url
- Restores ability to scan a site locally (file://)
- Adds ability to attempt scan Wix site. No option for user, Wix site is autodetected using the generator meta tag
- We don't endorse or encourage the use of Wix, their dependency on ajax breaks accessibility standards and makes them difficult for machines to crawl (ie SEO tools and search engine bots) and impossible for humans to view without the necessary technologies available and enabled in the browser.
- Fixes bug in 'highlighting', if the link occurred more than once on the page, only the first would be highlighted properly
- Fixes minor bug in column selector above certain tables, for French users
- (Integrity Plus) Fixes bug preventing pages from being correctly excluded from sitemap where robots noindex is set in the page head
- (Integrity Plus) Fixes bug causing potential crash if pages are excluded from sit