net.sf.jmatchparser.util.charset
Class UTFBOMCharsetsProvider

java.lang.Object
  extended by java.nio.charset.spi.CharsetProvider
      extended by net.sf.jmatchparser.util.charset.UTFBOMCharsetsProvider

public class UTFBOMCharsetsProvider
extends CharsetProvider

Charset provider that provides an UTF-BOM.charset charset for every other supported charset, and a charset-BOM charset for each UTF charset.

The UTF-BOM. charsets will try to detect a byte order mark of UTF-16LE, UTF-16BE or UTF-8.

If no byte order mark could be detected, it falls back to the charset given at the end of the charset name.

This provider also provides charsets UTF-8-BOM, UTF-16LE-BOM and UTF-16BE-BOM, which act like their counterparts without -BOM, but will add a byte order mark when encoding and strip it when decoding (if present).

Two additional charsets, UTF-8-Binary and UTF-8-Binary-PUA are supersets of UTF-8 that will be binary safe on decoding (i. e. every byte sequence will remain intact if decoded and encoded again). The first mentioned charset will use unpaired surrogates in the range U+DC80 to U+DCFF, as suggested in the UTF-8 Wikipedia article; the second one uses codepoints U+E980 to U+E9FF from the Private Use Area, escaping those code points (and the escape character) with a U+E97F character if needed.

This class is loaded automatically via SPI when it is in the class path.


Constructor Summary
UTFBOMCharsetsProvider()
           
 
Method Summary
 Charset charsetForName(String charsetName)
           
 Iterator<Charset> charsets()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UTFBOMCharsetsProvider

public UTFBOMCharsetsProvider()
Method Detail

charsetForName

public Charset charsetForName(String charsetName)
Specified by:
charsetForName in class CharsetProvider

charsets

public Iterator<Charset> charsets()
Specified by:
charsets in class CharsetProvider


Copyright © 2011. All Rights Reserved.