Qt 4.8
Public Functions | Static Public Functions | List of all members
QGbkCodec Class Reference

The QGbkCodec class provides conversion to and from the Chinese GBK encoding. More...

#include <qgb18030codec.h>

Inheritance diagram for QGbkCodec:
QGb18030Codec QTextCodec

Public Functions

QList< QByteArrayaliases () const
 Subclasses can return a number of aliases for the codec in question. More...
 
QByteArray convertFromUnicode (const QChar *, int, ConverterState *) const
 Reimplemented Function More...
 
QString convertToUnicode (const char *, int, ConverterState *) const
 QTextCodec subclasses must reimplement this function. More...
 
int mibEnum () const
 Subclasses of QTextCodec must reimplement this function. More...
 
QByteArray name () const
 QTextCodec subclasses must reimplement this function. More...
 
 QGbkCodec ()
 Constructs a QGbkCodec object. More...
 
- Public Functions inherited from QGb18030Codec
 QGb18030Codec ()
 
- Public Functions inherited from QTextCodec
bool canEncode (QChar) const
 Returns true if the Unicode character ch can be fully encoded with this codec; otherwise returns false. More...
 
bool canEncode (const QString &) const
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.s contains the string being tested for encode-ability. More...
 
QByteArray fromUnicode (const QString &uc) const
 Converts str from Unicode to the encoding of this codec, and returns the result in a QByteArray. More...
 
QByteArray fromUnicode (const QChar *in, int length, ConverterState *state=0) const
 Converts the first number of characters from the input array from Unicode to the encoding of this codec, and returns the result in a QByteArray. More...
 
QTextDecodermakeDecoder () const
 Creates a QTextDecoder which stores enough state to decode chunks of char * data to create chunks of Unicode data. More...
 
QTextDecodermakeDecoder (ConversionFlags flags) const
 
QTextEncodermakeEncoder () const
 Creates a QTextEncoder which stores enough state to encode chunks of Unicode data as char * data. More...
 
QTextEncodermakeEncoder (ConversionFlags flags) const
 
QString toUnicode (const QByteArray &) const
 Converts a from the encoding of this codec to Unicode, and returns the result in a QString. More...
 
QString toUnicode (const char *chars) const
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.chars contains the source characters. More...
 
QString toUnicode (const char *in, int length, ConverterState *state=0) const
 Converts the first size characters from the input from the encoding of this codec to Unicode, and returns the result in a QString. More...
 

Static Public Functions

static QList< QByteArray_aliases ()
 
static int _mibEnum ()
 
static QByteArray _name ()
 
- Static Public Functions inherited from QGb18030Codec
static QList< QByteArray_aliases ()
 
static int _mibEnum ()
 
static QByteArray _name ()
 
- Static Public Functions inherited from QTextCodec
static QList< QByteArrayavailableCodecs ()
 Returns the list of all available codecs, by name. More...
 
static QList< int > availableMibs ()
 Returns the list of MIBs for all available codecs. More...
 
static QTextCodeccodecForCStrings ()
 Returns the codec used by QString to convert to and from const char * and QByteArrays. More...
 
static QTextCodeccodecForHtml (const QByteArray &ba)
 Tries to detect the encoding of the provided snippet of HTML in the given byte array, ba, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. More...
 
static QTextCodeccodecForHtml (const QByteArray &ba, QTextCodec *defaultCodec)
 Tries to detect the encoding of the provided snippet of HTML in the given byte array, ba, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. More...
 
static QTextCodeccodecForLocale ()
 Returns a pointer to the codec most suitable for this locale. More...
 
static QTextCodeccodecForMib (int mib)
 Returns the QTextCodec which matches the MIBenum mib. More...
 
static QTextCodeccodecForName (const QByteArray &name)
 Searches all installed QTextCodec objects and returns the one which best matches name; the match is case-insensitive. More...
 
static QTextCodeccodecForName (const char *name)
 Searches all installed QTextCodec objects and returns the one which best matches name; the match is case-insensitive. More...
 
static QTextCodeccodecForTr ()
 Returns the codec used by QObject::tr() on its argument. More...
 
static QTextCodeccodecForUtfText (const QByteArray &ba)
 Tries to detect the encoding of the provided snippet ba by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. More...
 
static QTextCodeccodecForUtfText (const QByteArray &ba, QTextCodec *defaultCodec)
 Tries to detect the encoding of the provided snippet ba by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. More...
 
static void setCodecForCStrings (QTextCodec *c)
 
static void setCodecForLocale (QTextCodec *c)
 Set the codec to c; this will be returned by codecForLocale(). More...
 
static void setCodecForTr (QTextCodec *c)
 

Additional Inherited Members

- Public Types inherited from QTextCodec
enum  ConversionFlag { DefaultConversion, ConvertInvalidToNull = 0x80000000, IgnoreHeader = 0x1, FreeFunction = 0x2 }
 
- Protected Functions inherited from QTextCodec
 QTextCodec ()
 Constructs a QTextCodec, and gives it the highest precedence. More...
 
virtual ~QTextCodec ()
 Destroys the QTextCodec. More...
 

Detailed Description

The QGbkCodec class provides conversion to and from the Chinese GBK encoding.

Note
This class or function is reentrant.
Warning
This function is not part of the public interface.

GBK, formally the Chinese Internal Code Specification, is a commonly used extension of GB 2312-80. Microsoft Windows uses it under the name code page 936.

The GBK encoding has been superseded by the GB18030 encoding and GB18030 is backward compatible to GBK. For this reason the QGbkCodec class is implemented in terms of the GB18030 codec and uses its 1-byte and 2-byte portion for conversion from and to Unicode.

The QGbkCodec is kept mainly for compatibility reasons with older software.

Definition at line 70 of file qgb18030codec.h.

Constructors and Destructors

◆ QGbkCodec()

QGbkCodec::QGbkCodec ( )

Constructs a QGbkCodec object.

Definition at line 297 of file qgb18030codec.cpp.

298  : QGb18030Codec()
299 {
300 }

Functions

◆ _aliases()

QList< QByteArray > QGbkCodec::_aliases ( )
static

Definition at line 312 of file qgb18030codec.cpp.

Referenced by CNTextCodecs::aliases(), and CNTextCodecs::createForName().

313 {
314  QList<QByteArray> list;
315  list << "CP936"
316  << "MS936"
317  << "windows-936";
318  return list;
319 }

◆ _mibEnum()

int QGbkCodec::_mibEnum ( )
static

Definition at line 302 of file qgb18030codec.cpp.

Referenced by CNTextCodecs::createForMib(), and CNTextCodecs::mibEnums().

303 {
304  return 113;
305 }

◆ _name()

QByteArray QGbkCodec::_name ( )
static

Definition at line 307 of file qgb18030codec.cpp.

Referenced by CNTextCodecs::createForName(), and CNTextCodecs::names().

308 {
309  return "GBK";
310 }

◆ aliases()

QList<QByteArray> QGbkCodec::aliases ( ) const
inlinevirtual

Subclasses can return a number of aliases for the codec in question.

Standard aliases for codecs can be found in the IANA character-sets encoding file.

Reimplemented from QGb18030Codec.

Definition at line 79 of file qgb18030codec.h.

79 { return _aliases(); }
static QList< QByteArray > _aliases()

◆ convertFromUnicode()

QByteArray QGbkCodec::convertFromUnicode ( const QChar uc,
int  len,
ConverterState state 
) const
virtual

Reimplemented Function

Reimplemented from QGb18030Codec.

Definition at line 396 of file qgb18030codec.cpp.

397 {
398  char replacement = '?';
399  if (state) {
400  if (state->flags & ConvertInvalidToNull)
401  replacement = 0;
402  }
403  int invalid = 0;
404 
405  int rlen = 2*len + 1;
406  QByteArray rstr;
407  rstr.resize(rlen);
408  uchar* cursor = (uchar*)rstr.data();
409 
410  //qDebug("QGbkCodec::fromUnicode(const QString& uc, int& lenInOut = %d)", lenInOut);
411  for (int i = 0; i < len; i++) {
412  QChar ch = uc[i];
413  uchar buf[2];
414 
415  if (ch.row() == 0x00 && ch.cell() < 0x80) {
416  // ASCII
417  *cursor++ = ch.cell();
418  } else if (qt_UnicodeToGbk(ch.unicode(), buf) == 2) {
419  *cursor++ = buf[0];
420  *cursor++ = buf[1];
421  } else {
422  // Error
423  *cursor += replacement;
424  ++invalid;
425  }
426  }
427  rstr.resize(cursor - (uchar*)rstr.constData());
428 
429  if (state) {
430  state->invalidChars += invalid;
431  }
432  return rstr;
433 }
char * data()
Returns a pointer to the data stored in the byte array.
Definition: qbytearray.h:429
ushort unicode() const
This is an overloaded member function, provided for convenience. It differs from the above function o...
Definition: qchar.h:251
The QByteArray class provides an array of bytes.
Definition: qbytearray.h:135
The QChar class provides a 16-bit Unicode character.
Definition: qchar.h:72
unsigned char uchar
Definition: qglobal.h:994
const char * constData() const
Returns a pointer to the data stored in the byte array.
Definition: qbytearray.h:433
void resize(int size)
Sets the size of the byte array to size bytes.
uchar cell() const
Returns the cell (least significant byte) of the Unicode character.
Definition: qchar.h:283
uchar row() const
Returns the row (most significant byte) of the Unicode character.
Definition: qchar.h:284
int qt_UnicodeToGbk(uint unicode, uchar *gbchar)

◆ convertToUnicode()

QString QGbkCodec::convertToUnicode ( const char *  chars,
int  len,
ConverterState state 
) const
virtual

QTextCodec subclasses must reimplement this function.

Converts the first len characters of chars from the encoding of the subclass to Unicode, and returns the result in a QString.

state can be 0, in which case the conversion is stateless and default conversion rules should be used. If state is not 0, the codec should save the state after the conversion in state, and adjust the remainingChars and invalidChars members of the struct.

Reimplemented from QGb18030Codec.

Definition at line 321 of file qgb18030codec.cpp.

322 {
323  uchar buf[2];
324  int nbuf = 0;
325  ushort replacement = QChar::ReplacementCharacter;
326  if (state) {
327  if (state->flags & ConvertInvalidToNull)
328  replacement = QChar::Null;
329  nbuf = state->remainingChars;
330  buf[0] = state->state_data[0];
331  buf[1] = state->state_data[1];
332  }
333  int invalid = 0;
334 
335  QString result;
336  result.resize(len);
337  int unicodeLen = 0;
338  ushort *const resultData = reinterpret_cast<ushort*>(result.data());
339 
340  //qDebug("QGbkDecoder::toUnicode(const char* chars = \"%s\", int len = %d)", chars, len);
341  for (int i=0; i<len; i++) {
342  uchar ch = chars[i];
343  switch (nbuf) {
344  case 0:
345  if (IsLatin(ch)) {
346  // ASCII
347  resultData[unicodeLen] = ch;
348  ++unicodeLen;
349  } else if (Is1stByte(ch)) {
350  // GBK 1st byte?
351  buf[0] = ch;
352  nbuf = 1;
353  } else {
354  // Invalid
355  resultData[unicodeLen] = replacement;
356  ++unicodeLen;
357  ++invalid;
358  }
359  break;
360  case 1:
361  // GBK 2nd byte
362  if (Is2ndByteIn2Bytes(ch)) {
363  buf[1] = ch;
364  int clen = 2;
365  uint u = qt_Gb18030ToUnicode(buf, clen);
366  if (clen == 2) {
367  resultData[unicodeLen] = qValidChar(static_cast<ushort>(u));
368  ++unicodeLen;
369  } else {
370  resultData[unicodeLen] = replacement;
371  ++unicodeLen;
372  ++invalid;
373  }
374  nbuf = 0;
375  } else {
376  // Error
377  resultData[unicodeLen] = replacement;
378  ++unicodeLen;
379  ++invalid;
380  nbuf = 0;
381  }
382  break;
383  }
384  }
385  result.resize(unicodeLen);
386 
387  if (state) {
388  state->remainingChars = nbuf;
389  state->state_data[0] = buf[0];
390  state->state_data[1] = buf[1];
391  state->invalidChars += invalid;
392  }
393  return result;
394 }
#define IsLatin(c)
#define qValidChar(u)
static uint qt_Gb18030ToUnicode(const uchar *gbstr, int &len)
quint16 u
The QString class provides a Unicode character string.
Definition: qstring.h:83
QChar * data()
Returns a pointer to the data stored in the QString.
Definition: qstring.h:710
unsigned char uchar
Definition: qglobal.h:994
unsigned int uint
Definition: qglobal.h:996
void resize(int size)
Sets the size of the string to size characters.
Definition: qstring.cpp:1353
#define Is1stByte(c)
unsigned short ushort
Definition: qglobal.h:995
#define Is2ndByteIn2Bytes(c)

◆ mibEnum()

int QGbkCodec::mibEnum ( ) const
inlinevirtual

Subclasses of QTextCodec must reimplement this function.

It returns the MIBenum (see IANA character-sets encoding file for more information). It is important that each QTextCodec subclass returns the correct unique value for this function.

Reimplemented from QGb18030Codec.

Definition at line 80 of file qgb18030codec.h.

80 { return _mibEnum(); }
static int _mibEnum()

◆ name()

QByteArray QGbkCodec::name ( ) const
inlinevirtual

QTextCodec subclasses must reimplement this function.

It returns the name of the encoding supported by the subclass.

If the codec is registered as a character set in the IANA character-sets encoding file this method should return the preferred mime name for the codec if defined, otherwise its name.

Reimplemented from QGb18030Codec.

Definition at line 78 of file qgb18030codec.h.

78 { return _name(); }
static QByteArray _name()

The documentation for this class was generated from the following files: