w3.org DTD/xhtml1-strict.dtd blocks Windows IE users?

Updated on: February 23 2009
Updated on: February 25 2009

On a few sites I maintain we have several man pages setup using XML and XSL. This week started getting complaints from Windows IE users saying they can’t see the man pages any more. The error message is:
The XML page cannot be displayed
Cannot view XML input using style sheet.
Please correct the error and then click
the Refresh button, or try again later.

————————————————————-

The server did not understand the request,
or the request was invalid. Error processing resource
‘http://www.w3.org/TR/xhtm…

The header of my page has this in it:

<?xml version="1.0" encoding="UTF-8"?>


When I try to access either http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd or http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd in any browser except Windows IE, the page loads or downloads as expected. In Windows IE the only thing that is served up is “No”.

I am curious of others are seeing this. Is it a Microsoft problem? Is it a W3.org problem? As it is Windows IE users appear to be out of luck. Perhaps w3.org simply has had enough of Windows IE and wants them to go away?

I would love to hear other people results on trying to load these URL’s and their comments.


Updated on: February 23 2009
Further investigation into this problem, shows that the User-Agent string is the key to IE being blocked from access the DTD’s on w3.org.


curl http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd -D ./dump.txt -A “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)”

Results:
———————
No
———————

dump.txt
———————
HTTP/1.1 503 Go away
Date: Mon, 23 Feb 2009 13:48:30 GMT
Server: Apache/2
Content-Location: msie7.asis
Vary: negotiate,User-Agent
TCN: choice
Retry-After: 86400
Cache-Control: max-age=21600
Expires: Mon, 23 Feb 2009 19:48:30 GMT
P3P: policyref=“http://www.w3.org/2001/05/P3P/p3p.xml”
Content-Length: 2
Connection: close
Content-Type: text/plain
———————

curl http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd -D ./dump.txt -A “Mozilla/4.0 (compatible; Windows NT 5.1; .NET CLR 1.1.4322)”

Results:
———————
.
.
.
The entire DTD file successfully lists
.
.
.
———————

dump.txt
———————
HTTP/1.1 200 OK
Date: Mon, 23 Feb 2009 13:50:22 GMT
Server: Apache/2
Content-Location: xhtml1-transitional.dtd.raw
Vary: negotiate,accept-encoding
TCN: choice
Last-Modified: Thu, 01 Aug 2002 18:37:56 GMT
ETag: “7d6f-3a72ac59d0900;45a3e4327da00”
Accept-Ranges: bytes
Content-Length: 32111
Cache-Control: max-age=7776000
Expires: Sun, 24 May 2009 13:50:22 GMT
P3P: policyref=“http://www.w3.org/2001/05/P3P/p3p.xml”
Connection: close
Content-Type: application/xml-dtd; charset=utf-8
———————

Further research shows that the offending User-Agent string would appear to be MSIE. Removal of MSIE or any change to MSIE results in a successful return of the DTD.

I tried contacting w3.org last week when I first posted this, but obviously I have the wrong contact info as no one has responded yet.


Updated on: February 25 2009
I heard back from w3.org. They responded with:

This is a known issue related to W3C’s excessive traffic [1]. We are
working with Microsoft, and a fix is expected in coming months.

[1] http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

It would appear that Windows IE is attempting to load the DTD on each page load, which is improper behaviour. Perhaps the only solution at this point is to host a copy of the DTD on our own server so that Windows users can still read the XML pages.

Thoughts and suggestions are always welcome.