I came across an interesting problem the other day when working on an ATG project that was using DAS (Dynamo Application Server). They'd recently updated their site to use UTF-8 as the encoding and this had appeared to work fine until it was noticed that occasionally some characters had been replaced by two unkown characters. The interesting thing here was that inserting a space or some other character earlier in the page so that a Latin1 character fell into the same spot resulted in the page rendering as expected.
Looking into the DAS code it became apparent that when DAS was creating the Java servlets it was defaulting the encoding to Latin1 (ISO-8859-1) as defined in the JPS specification, if it wasn't explicitly defined. This in itself wasn't a problem as when the servlet was executed it used the encoding that was defined in the response to process the output stream. However it also used this encoding to determine if it should use a ByteBuffer or a CharBuffer in the servlet. If the encoding was Latin1, which is was as we weren't explicitly setting it, then it was using a ByteBuffer. The result was that any UTF-8 characters that were larger than a Latin1 byte were getting split in two if they fell over the buffer and getting displayed as the two illegal UTF-8 characters.
The solution to this is to either set the encoding explicitly on each page using:
<%@page pageEncoding="UTF-8"%>
Or else to hack DAS to default to an encoding other than ISO-8859-1.
No comments:
Post a Comment