There is a write up at Coding Insecurity on filtering non ascii characters to prevent XSS attacks.
"I have been working on a medium-sized development project lately
and, came across a peculiar phenomenon where I could execute scripts on
a page without the use of less-than (<) or greater-than (>)
symbols. Instead I used double-byte characters. For a little detail on
the project, the technologies being used include Apache Struts 1.3.8,
the Commons Validator plug-in, DHTMLSuite, and some other AJAX style
controls to make the UI interactive. So now on to the findings.
To start out, the character encoding on all of the JSPs were set to ISO-8859-1.
In addition, validation was in place for all form fields although as it
turns out, much to my chagrin, that clever people can bypass anything.
Since the field in question was free form, a user could enter anything
they wanted, as we developers had decided was ok since we would be
vigilant in our use of output encoding. For more info on why this is should be ok, check out Jim Manico's blog article here.
To accomplish our output encoding, we decided to use the <bean:write
/> tag from the Struts 1.3.8 tag library as it has a fairly decent
encoding practice, although it does leave a little – or quite a bit
depending on your point of view – to be desired."
Read More: lhttp://coding-insecurity.blogspot.com/2008/10/executing-scripts-with-non-english.html