17 May 2012

Word 2010 Error: "Problem with the contents of the file"

Update 1: There is a solution that may work for some people from Microsoft (released 18/02/2013, nearly a year after this post). Download the Fix-it from this page and see if it repairs the file for you.
Update 2: Also there is a similar blog here which explains the same as I've done (also dated later but describes in a different way)

There is a bug in Word 2010 I've come across for the third time now to do with XML validation on documents when they are saved over a network.

You may have seen this before:


The file 113 Honey Lane brochure.docx cannot be opened because there are problems with the contents.
The XML data is invalid according to the schema.
Location: Part: /word/document.xml, Line: 2, Column: 0

Some online articles report this as a problem with the table of contents, however I've worked with documents that don't have a table of contents. Also I cannot see any Microsoft response to this issue and I hope to get their attention to get this fixed.

Word does have an 'Open & Repair' option, however this only tries to read text from the document, so this doesn't help us at all and leaves us stranded with a 99.99% valid XML document but unable to use it - that is one of the pitfalls of XML I guess. But really this is a bug that Microsoft needs to correct.

How to fix

1. Extract the DOCX file using an compression program (i.e. 7-zip) to a new folder.



2. The 'Location' on the error dialog is where the problem is. However as it says Line 2 Column 0, well, Column 0 is right when the main body opens - which means the actual XML problem could be anywhere. If we knew where it was, we would just need to go to Line 2, Column 286 or whatever, and correct the XML. However when it is like this, we need to validate the XML instead. I recommend Oxygen XML Editor:

http://www.oxygenxml.com/download_oxygenxml_editor.html
(30 day free trial licence sent over email address, 90MB download)

3. Open up Oxygen and open the word/document.xml file from the extracted folder.


4. Validate the XML.


5. Right at the bottom of the validation report are the important errors. Some errors are permittable.



The ones in red are the ones we need to fix. The ones in green are fine.

6. Fix the XML. CTRL+L can take you to the location in blue (Line 2, Column 365...) Here is one example of XML that needs to be fixed but in general the problems I've seen have involved incomplete tags and missing closing tags.


Here, the < mc:Fallback > tag doesn't have a closing tag. Also, there is no closing < /mc:AlternateContent > tag before the new < w:p >. Here is the corrected version. The first 4 validation errors from above, I actually couldn't fix, so I just removed the < mc:Fallback > block altogether (it is a fallback so it is not important anyway).




We can see that the same block is now validated to be OK (there is no longer a red dotted line under the tree leaf on the left). I had to add 2 closing elements and Oxygen actually autocompleted them as soon as I typed < / so I didn't need to type out the whole thing ;)

After saving and validating again we can see there are 2 more errors somewhere else in the document, again these just need some more closing tags.

7. When it validates so there are no more major errors (not to do with namespaces/context), just zip up all the files again in the extracted folder (must be using the ZIP format, other parameters don't matter) and rename to docx. Now open it in Word and the error is gone.


Online articles


http://answers.microsoft.com/en-us/office/forum/office_2010-word/word-2010-document-will-not-open-illegal-xml/0d6d05df-d201-4537-a154-7b2a3df4e32a?msgId=c4b8dd1b-2fe4-4245-aa39-d6df9ed6cbe7

http://help.wugnet.com/office/Word-2007-XML-error-message-ftopict1173860.html

http://help.wugnet.com/office/open-docx-file-content-error-ftopict1081932.html

http://www.pcreview.co.uk/forums/office-open-xml-file-docx-cannot-opened-t3714016.html