Class CharsetRecog_Unicode

java.lang.Object
com.ibm.icu.text.CharsetRecognizer
com.ibm.icu.text.CharsetRecog_Unicode
Direct Known Subclasses:
CharsetRecog_Unicode.CharsetRecog_UTF_16_BE, CharsetRecog_Unicode.CharsetRecog_UTF_16_LE, CharsetRecog_Unicode.CharsetRecog_UTF_32

abstract class CharsetRecog_Unicode extends CharsetRecognizer
This class matches UTF-16 and UTF-32, both big- and little-endian. The BOM will be used if it is present.
  • Constructor Details

    • CharsetRecog_Unicode

      CharsetRecog_Unicode()
  • Method Details

    • getName

      abstract String getName()
      Description copied from class: CharsetRecognizer
      Get the IANA name of this charset.
      Specified by:
      getName in class CharsetRecognizer
      Returns:
      the charset name.
    • match

      abstract CharsetMatch match(CharsetDetector det)
      Description copied from class: CharsetRecognizer
      Test the match of this charset with the input text data which is obtained via the CharsetDetector object.
      Specified by:
      match in class CharsetRecognizer
      Parameters:
      det - The CharsetDetector, which contains the input text to be checked for being in this charset.
      Returns:
      A CharsetMatch object containing details of match with this charset, or null if there was no match.
    • codeUnit16FromBytes

      static int codeUnit16FromBytes(byte hi, byte lo)
    • adjustConfidence

      static int adjustConfidence(int codeUnit, int confidence)