Unicode file format

Discussion in 'Mixed Languages' started by Xarzu, Apr 9, 2012.

  1. Xarzu

    Xarzu MDL Novice

    Nov 28, 2009
    14
    0
    0
    #1 Xarzu, Apr 9, 2012
    Last edited: Apr 10, 2012
    I am going to write a computer program -- out of necessity -- that will reformat a ascii file to a unicode file. The reason I am doing this is simply because not all input text objects on the internet use the same technique for displaying double byte. HTML files have bypassed the problem that not all browsers display double byte characters by having &# at the front of the character unicode number and then a semicolon at the end. For example: "& # 3 5 9 1 0 ;"

    But if you paste such a thing, here, in this text input area, it will appear as "& # 3 5 9 1 0 ;"
    instead of 快 which is how it would appear if it was interpreted directly from an html file into a browser.

    So, in order to have the ability to have the ability to see both character representations and copy and paste into a text input file like a forum like this one I have decided to write a conversion program. Sure, I guess I could use a browser, but it would not show me what the raw html text code would look like simultaneously, which would be helpful and cool for what I need to do.

    So, anyway, the point is this. I am going to make a program that creates a unicode file. Is there any sort of special format I need to be aware of? Do unicode files have special headers that tell reading programs that it is a unicode file?

    And, if memory serves me correctly, double byte character representations of most ascii files is where the first tto hex digits are zeros. Is that right?