Qt 4.8
Public Functions | Static Public Functions | List of all members
QGb18030Codec Class Reference

#include <qgb18030codec.h>

Inheritance diagram for QGb18030Codec:
QTextCodec QGb2312Codec QGbkCodec

Public Functions

QList< QByteArrayaliases () const
 Subclasses can return a number of aliases for the codec in question. More...
 
QByteArray convertFromUnicode (const QChar *, int, ConverterState *) const
 Reimplemented Function More...
 
QString convertToUnicode (const char *, int, ConverterState *) const
 QTextCodec subclasses must reimplement this function. More...
 
int mibEnum () const
 Subclasses of QTextCodec must reimplement this function. More...
 
QByteArray name () const
 QTextCodec subclasses must reimplement this function. More...
 
 QGb18030Codec ()
 
- Public Functions inherited from QTextCodec
bool canEncode (QChar) const
 Returns true if the Unicode character ch can be fully encoded with this codec; otherwise returns false. More...
 
bool canEncode (const QString &) const
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.s contains the string being tested for encode-ability. More...
 
QByteArray fromUnicode (const QString &uc) const
 Converts str from Unicode to the encoding of this codec, and returns the result in a QByteArray. More...
 
QByteArray fromUnicode (const QChar *in, int length, ConverterState *state=0) const
 Converts the first number of characters from the input array from Unicode to the encoding of this codec, and returns the result in a QByteArray. More...
 
QTextDecodermakeDecoder () const
 Creates a QTextDecoder which stores enough state to decode chunks of char * data to create chunks of Unicode data. More...
 
QTextDecodermakeDecoder (ConversionFlags flags) const
 
QTextEncodermakeEncoder () const
 Creates a QTextEncoder which stores enough state to encode chunks of Unicode data as char * data. More...
 
QTextEncodermakeEncoder (ConversionFlags flags) const
 
QString toUnicode (const QByteArray &) const
 Converts a from the encoding of this codec to Unicode, and returns the result in a QString. More...
 
QString toUnicode (const char *chars) const
 This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.chars contains the source characters. More...
 
QString toUnicode (const char *in, int length, ConverterState *state=0) const
 Converts the first size characters from the input from the encoding of this codec to Unicode, and returns the result in a QString. More...
 

Static Public Functions

static QList< QByteArray_aliases ()
 
static int _mibEnum ()
 
static QByteArray _name ()
 
- Static Public Functions inherited from QTextCodec
static QList< QByteArrayavailableCodecs ()
 Returns the list of all available codecs, by name. More...
 
static QList< int > availableMibs ()
 Returns the list of MIBs for all available codecs. More...
 
static QTextCodeccodecForCStrings ()
 Returns the codec used by QString to convert to and from const char * and QByteArrays. More...
 
static QTextCodeccodecForHtml (const QByteArray &ba)
 Tries to detect the encoding of the provided snippet of HTML in the given byte array, ba, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. More...
 
static QTextCodeccodecForHtml (const QByteArray &ba, QTextCodec *defaultCodec)
 Tries to detect the encoding of the provided snippet of HTML in the given byte array, ba, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. More...
 
static QTextCodeccodecForLocale ()
 Returns a pointer to the codec most suitable for this locale. More...
 
static QTextCodeccodecForMib (int mib)
 Returns the QTextCodec which matches the MIBenum mib. More...
 
static QTextCodeccodecForName (const QByteArray &name)
 Searches all installed QTextCodec objects and returns the one which best matches name; the match is case-insensitive. More...
 
static QTextCodeccodecForName (const char *name)
 Searches all installed QTextCodec objects and returns the one which best matches name; the match is case-insensitive. More...
 
static QTextCodeccodecForTr ()
 Returns the codec used by QObject::tr() on its argument. More...
 
static QTextCodeccodecForUtfText (const QByteArray &ba)
 Tries to detect the encoding of the provided snippet ba by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. More...
 
static QTextCodeccodecForUtfText (const QByteArray &ba, QTextCodec *defaultCodec)
 Tries to detect the encoding of the provided snippet ba by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. More...
 
static void setCodecForCStrings (QTextCodec *c)
 
static void setCodecForLocale (QTextCodec *c)
 Set the codec to c; this will be returned by codecForLocale(). More...
 
static void setCodecForTr (QTextCodec *c)
 

Additional Inherited Members

- Public Types inherited from QTextCodec
enum  ConversionFlag { DefaultConversion, ConvertInvalidToNull = 0x80000000, IgnoreHeader = 0x1, FreeFunction = 0x2 }
 
- Protected Functions inherited from QTextCodec
 QTextCodec ()
 Constructs a QTextCodec, and gives it the highest precedence. More...
 
virtual ~QTextCodec ()
 Destroys the QTextCodec. More...
 

Detailed Description

Note
This class or function is reentrant.
Warning
This function is not part of the public interface.

Definition at line 54 of file qgb18030codec.h.

Constructors and Destructors

◆ QGb18030Codec()

QGb18030Codec::QGb18030Codec ( )

Definition at line 83 of file qgb18030codec.cpp.

84 {
85 }

Functions

◆ _aliases()

static QList<QByteArray> QGb18030Codec::_aliases ( )
inlinestatic

◆ _mibEnum()

static int QGb18030Codec::_mibEnum ( )
inlinestatic

◆ _name()

static QByteArray QGb18030Codec::_name ( )
inlinestatic

◆ aliases()

QList<QByteArray> QGb18030Codec::aliases ( ) const
inlinevirtual

Subclasses can return a number of aliases for the codec in question.

Standard aliases for codecs can be found in the IANA character-sets encoding file.

Reimplemented from QTextCodec.

Reimplemented in QGbkCodec.

Definition at line 63 of file qgb18030codec.h.

63 { return _aliases(); }
static QList< QByteArray > _aliases()
Definition: qgb18030codec.h:59

◆ convertFromUnicode()

QByteArray QGb18030Codec::convertFromUnicode ( const QChar uc,
int  len,
ConverterState state 
) const
virtual

Reimplemented Function

Implements QTextCodec.

Reimplemented in QGb2312Codec, and QGbkCodec.

Definition at line 88 of file qgb18030codec.cpp.

Referenced by mibEnum(), QGbkCodec::mibEnum(), QGb2312Codec::mibEnum(), QFontGb2312Codec::mibEnum(), QFontGbkCodec::mibEnum(), and QFontGb18030_0Codec::mibEnum().

89 {
90  char replacement = '?';
91  int high = -1;
92  if (state) {
93  if (state->flags & ConvertInvalidToNull)
94  replacement = 0;
95  if (state->remainingChars)
96  high = state->state_data[0];
97  }
98  int invalid = 0;
99 
100  int rlen = 4 *len + 1;
101  QByteArray rstr;
102  rstr.resize(rlen);
103  uchar* cursor = (uchar*)rstr.data();
104 
105  //qDebug("QGb18030Codec::fromUnicode(const QString& uc, int& lenInOut = %d)", lenInOut);
106  for (int i = 0; i < len; i++) {
107  unsigned short ch = uc[i].unicode();
108  int len;
109  uchar buf[4];
110  if (high >= 0) {
111  if (uc[i].isLowSurrogate()) {
112  // valid surrogate pair
113  ++i;
114  uint u = QChar::surrogateToUcs4(high, uc[i].unicode());
115  len = qt_UnicodeToGb18030(u, buf);
116  if (len >= 2) {
117  for (int j=0; j<len; j++)
118  *cursor++ = buf[j];
119  } else {
120  *cursor++ = replacement;
121  ++invalid;
122  }
123  high = -1;
124  continue;
125  } else {
126  *cursor++ = replacement;
127  ++invalid;
128  high = -1;
129  }
130  }
131 
132  if (IsLatin(ch)) {
133  // ASCII
134  *cursor++ = ch;
135  } else if (uc[i].isHighSurrogate()) {
136  // surrogates area. check for correct encoding
137  // we need at least one more character, first the high surrogate, then the low one
138  high = ch;
139  } else if ((len = qt_UnicodeToGb18030(ch, buf)) >= 2) {
140  for (int j=0; j<len; j++)
141  *cursor++ = buf[j];
142  } else {
143  // Error
144  *cursor++ = replacement;
145  ++invalid;
146  }
147  }
148  rstr.resize(cursor - (uchar*)rstr.constData());
149 
150  if (state) {
151  state->invalidChars += invalid;
152  state->state_data[0] = high;
153  if (high)
154  state->remainingChars = 1;
155  }
156  return rstr;
157 }
#define IsLatin(c)
char * data()
Returns a pointer to the data stored in the byte array.
Definition: qbytearray.h:429
ushort unicode() const
This is an overloaded member function, provided for convenience. It differs from the above function o...
Definition: qchar.h:251
The QByteArray class provides an array of bytes.
Definition: qbytearray.h:135
quint16 u
unsigned char uchar
Definition: qglobal.h:994
unsigned int uint
Definition: qglobal.h:996
const char * constData() const
Returns a pointer to the data stored in the byte array.
Definition: qbytearray.h:433
void resize(int size)
Sets the size of the byte array to size bytes.
static int qt_UnicodeToGb18030(uint unicode, uchar *gbchar)
static uint surrogateToUcs4(ushort high, ushort low)
Converts a UTF16 surrogate pair with the given high and low values to its UCS-4 code point...
Definition: qchar.h:297

◆ convertToUnicode()

QString QGb18030Codec::convertToUnicode ( const char *  chars,
int  len,
ConverterState state 
) const
virtual

QTextCodec subclasses must reimplement this function.

Converts the first len characters of chars from the encoding of the subclass to Unicode, and returns the result in a QString.

state can be 0, in which case the conversion is stateless and default conversion rules should be used. If state is not 0, the codec should save the state after the conversion in state, and adjust the remainingChars and invalidChars members of the struct.

Implements QTextCodec.

Reimplemented in QGb2312Codec, and QGbkCodec.

Definition at line 159 of file qgb18030codec.cpp.

Referenced by mibEnum(), QGbkCodec::mibEnum(), QGb2312Codec::mibEnum(), QFontGb2312Codec::mibEnum(), QFontGbkCodec::mibEnum(), and QFontGb18030_0Codec::mibEnum().

160 {
161  uchar buf[4];
162  int nbuf = 0;
163  ushort replacement = QChar::ReplacementCharacter;
164  if (state) {
165  if (state->flags & ConvertInvalidToNull)
166  replacement = QChar::Null;
167  nbuf = state->remainingChars;
168  buf[0] = (state->state_data[0] >> 24) & 0xff;
169  buf[1] = (state->state_data[0] >> 16) & 0xff;
170  buf[2] = (state->state_data[0] >> 8) & 0xff;
171  buf[3] = (state->state_data[0] >> 0) & 0xff;
172  }
173  int invalid = 0;
174 
175  QString result;
176  result.resize(len);
177  int unicodeLen = 0;
178  ushort *const resultData = reinterpret_cast<ushort*>(result.data());
179  //qDebug("QGb18030Decoder::toUnicode(const char* chars, int len = %d)", len);
180  for (int i = 0; i < len; i++) {
181  uchar ch = chars[i];
182  switch (nbuf) {
183  case 0:
184  if (IsLatin(ch)) {
185  // ASCII
186  resultData[unicodeLen] = ch;
187  ++unicodeLen;
188  } else if (Is1stByte(ch)) {
189  // GB18030?
190  buf[0] = ch;
191  nbuf = 1;
192  } else {
193  // Invalid
194  resultData[unicodeLen] = replacement;
195  ++unicodeLen;
196  ++invalid;
197  }
198  break;
199  case 1:
200  // GB18030 2 bytes
201  if (Is2ndByteIn2Bytes(ch)) {
202  buf[1] = ch;
203  int clen = 2;
204  uint u = qt_Gb18030ToUnicode(buf, clen);
205  if (clen == 2) {
206  resultData[unicodeLen] = qValidChar(static_cast<ushort>(u));
207  ++unicodeLen;
208  } else {
209  resultData[unicodeLen] = replacement;
210  ++unicodeLen;
211  ++invalid;
212  }
213  nbuf = 0;
214  } else if (Is2ndByteIn4Bytes(ch)) {
215  buf[1] = ch;
216  nbuf = 2;
217  } else {
218  // Error
219  resultData[unicodeLen] = replacement;
220  ++unicodeLen;
221  ++invalid;
222  nbuf = 0;
223  }
224  break;
225  case 2:
226  // GB18030 3 bytes
227  if (Is3rdByte(ch)) {
228  buf[2] = ch;
229  nbuf = 3;
230  } else {
231  resultData[unicodeLen] = replacement;
232  ++unicodeLen;
233  ++invalid;
234  nbuf = 0;
235  }
236  break;
237  case 3:
238  // GB18030 4 bytes
239  if (Is4thByte(ch)) {
240  buf[3] = ch;
241  int clen = 4;
242  uint u = qt_Gb18030ToUnicode(buf, clen);
243  if (clen == 4) {
244  resultData[unicodeLen] = qValidChar(u);
245  ++unicodeLen;
246  } else {
247  resultData[unicodeLen] = replacement;
248  ++unicodeLen;
249  ++invalid;
250  }
251  } else {
252  resultData[unicodeLen] = replacement;
253  ++unicodeLen;
254  ++invalid;
255  }
256  nbuf = 0;
257  break;
258  }
259  }
260  result.resize(unicodeLen);
261 
262  if (state) {
263  state->remainingChars = nbuf;
264  state->state_data[0] = (buf[0] << 24) + (buf[1] << 16) + (buf[2] << 8) + buf[3];
265  state->invalidChars += invalid;
266  }
267  return result;
268 }
#define IsLatin(c)
#define qValidChar(u)
static uint qt_Gb18030ToUnicode(const uchar *gbstr, int &len)
quint16 u
The QString class provides a Unicode character string.
Definition: qstring.h:83
QChar * data()
Returns a pointer to the data stored in the QString.
Definition: qstring.h:710
unsigned char uchar
Definition: qglobal.h:994
unsigned int uint
Definition: qglobal.h:996
#define Is2ndByteIn4Bytes(c)
void resize(int size)
Sets the size of the string to size characters.
Definition: qstring.cpp:1353
#define Is1stByte(c)
#define Is3rdByte(c)
#define Is4thByte(c)
unsigned short ushort
Definition: qglobal.h:995
#define Is2ndByteIn2Bytes(c)

◆ mibEnum()

int QGb18030Codec::mibEnum ( ) const
inlinevirtual

Subclasses of QTextCodec must reimplement this function.

It returns the MIBenum (see IANA character-sets encoding file for more information). It is important that each QTextCodec subclass returns the correct unique value for this function.

Implements QTextCodec.

Reimplemented in QGb2312Codec, and QGbkCodec.

Definition at line 64 of file qgb18030codec.h.

64 { return _mibEnum(); }
static int _mibEnum()
Definition: qgb18030codec.h:60

◆ name()

QByteArray QGb18030Codec::name ( ) const
inlinevirtual

QTextCodec subclasses must reimplement this function.

It returns the name of the encoding supported by the subclass.

If the codec is registered as a character set in the IANA character-sets encoding file this method should return the preferred mime name for the codec if defined, otherwise its name.

Implements QTextCodec.

Reimplemented in QGb2312Codec, and QGbkCodec.

Definition at line 62 of file qgb18030codec.h.

62 { return _name(); }
static QByteArray _name()
Definition: qgb18030codec.h:58

The documentation for this class was generated from the following files: