News Download Features Order Screenshots Contacts FAQ

About encoding in detail

Advanced XML converter is suitable tabular converter from XML format. This utility is able to convert data from XML into several tabular formats.
Free registration for translators. Just please contact us prior to starting the translation, and you'll receive instructions how to do this.
Here You can subscribe to the "Advanced XML Converter" mailing list, where we will provide you with latest information about upgrades and other topics concerning this program (you don't receive more than single letter every month).

Character encodings provide a map between a series of numbers and the characters people expect to see when they enter text into computers. The capital letter "A", for example, is represented by the decimal number 65 (41 in hexadecimal) in a variety of character encodings, including the ASCII text familiar to many Western programmers and Windows Code Page-1252, the default encoding used by most MicrosoftR WindowsR Western systems.

Character encodings are not fonts, which provide graphic representations, glyphs, that map to a particular character encoding. Microsoft Word, for example, includes a version of Arial (Arial Unicode MS) with tens of thousands of characters.

All XML processors are required to understand two transformations of the Unicode character encoding, UTF-8 and UTF-16. The Microsoft XML Parser (MSXML) supports more encodings, but all text in XML documents is treated internally as the Unicode UCS-2 character encoding.

Even different platforms representing the same set of Western characters can use different bytes to represent the same character, as shown in the following table.

Byte Windows
(CP1252)
Macintosh
(MacRoman)
140 Œ å
229 å Â
231 ç Á
232 è Ë
233 é È

Parsers can read in documents written ISO-8859-1, Big-5, or Shift-JIS, but the processing rules treat everything as Unicode. XML parsers perform the conversion while loading XML documents.

There are some limitations to auto-detecting character encodings. For example, 8-bit ASCII text is acceptable UTF-8, but UTF-8 is more than 8-bit ASCII text. For reliable processing, XML documents that use character encodings other than UTF-8 or UTF-16 must include an encoding declaration in the XML declaration. This makes it possible for a parser to read the characters correctly or report errors when it cannot process an encoding.

Because the XML declaration is written in basic ASCII text, parsers can read its contents even if the document is in a very different encoding. The encoding declaration significantly increases the likelihood that documents in encodings other than UTF-8 and UTF-16 will be interpreted correctly.

Some transactions, for example, those carried over HTTP and e-mail protocols, also provide information about character encodings. Microsoft Internet Explorer uses that information in document processing, but it isn't available, for example, if you load an XML document from a local hard drive or even a file server.

Copyright © HiBase Group, 2002-2016 | Privacy Policy