Space Hound 4 - [ About Microsoft Word Duplicate Content Search ]
Click HERE for Free Trial Download Click HERE for Order form
Return to Space Hound 4 Features
About Microsoft Word Duplicate Content Search (From Specialty Reports)
Duplicated Microsoft Word files cannot be found using normal duplicate file tools because they are often actually different files. Microsoft Word documents are stored in what is called Compound Document format.
Note: Use of this feature requires the local installation of the OfficeXP edition of Microsoft Word or later.
A recent test example with a single sentence of text containing 35 letters consumed over 26,000 bytes of file space. All Microsoft Word documents include this block of approximately 26,000 bytes in order to store statistical and informational values including total edit time, date last printed, previous revision information, etc.
This information can vary between two documents and the informational blocks may not even be the same size. This is why two Word Documents with identical textual content are often missed by other duplicate file finders.
The Microsoft Word Duplicate Content Search in Space Hound is designed to find files based on what they contain regardless of file name, file size, or the other miscellaneous information included within one of them. Use of this feature requires the local installation of Microsoft Word from the Office XP edition or later.
Special Instructions:
There are a number of situations where the use of this tool may be somewhat unsatisfactory. This is mostly applicable to damaged Microsoft Word Documents. A damaged document can cause unpredictable results within Microsoft Word itself. In many cases, Space Hound will detect these problems and bypass documents with problems. When a document is bypassed, it will be included within an Exceptions List (called EXCEPTION.TXT) that will be stored within the same folder where Space Hound is installed.The Exceptions List can be viewed in any text viewing program. It will include the name of the bypassed file and some information about why it was bypassed. In order to continue to improve Space Hound's ability to handle problem documents, please send the document to Fineware Systems via email to fineware@fineware.com. Please include the Exception List as well.
Additional Background:
Some Word documents may not contain any text at all. They may include only image files or other inline material. Space Hound is primarily only concerned with the textual content of the document. However, some inspection is performed to help rule out false positive results. If you identify two documents which are reported as identical that do not contain identical information, please forward both documents to fineware@fineware.com.Temporary documents, identified by use of the tilde symbol [ ~ ] at the beginning of the file name are not included in the search nor are template files (.DOT). Documents 'in use' cannot be examined. Documents are not altered by the program or by Microsoft Word even when found to be damaged. You may wish to open a damaged document manually with Microsoft Word to try to repair it once it has been identified by Space Hound and included on the Exception List.
If you notice that the program appears to be hung up (no animations, no status bar changes) for several minutes, then it is likely that an error has occurred within Word that cannot be bypassed by Space Hound. You can try using the Task Manager to kill any open instances of WINWORD (MS Word's executable task name) however this could result in unpredictable events. Space Hound will try to recover but this will depend on the type of problem encountered.
During document text inspection, the name of the document that is about to be examined is displayed on the bottom status bar. Should a problem occur that cannot be handled by Space Hound and Word, either use Word to Repair the document if possible or remove it from the folders being examined until after the Duplicate Word Content Search has completed.