There is a write up at Coding Insecurity on filtering non ascii characters to prevent XSS attacks.
"I have been working on a medium-sized development project lately and, came across a peculiar phenomenon where I could execute scripts on a page without the use of less-than (<) or greater-than (>) symbols. Instead I used double-byte characters. For a little detail on the project, the technologies being used include Apache Struts 1.3.8, the Commons Validator plug-in, DHTMLSuite, and some other AJAX style controls to make the UI interactive. So now on to the findings.
To start out, the character encoding on all of the JSPs were set to ISO-8859-1. In addition, validation was in place for all form fields although as it turns out, much to my chagrin, that clever people can bypass anything. Since the field in question was free form, a user could enter anything they wanted, as we developers had decided was ok since we would be vigilant in our use of output encoding. For more info on why this is should be ok, check out Jim Manico's blog article here. To accomplish our output encoding, we decided to use the <bean:write /> tag from the Struts 1.3.8 tag library as it has a fairly decent encoding practice, although it does leave a little - or quite a bit depending on your point of view - to be desired."