For example, I have an f.txt file on my mac, and the system is utf-8 encoding
Among them is the data “\ x97 \ a5”-the Chinese character “day” in utf-8 encoding.
Then I use ultraedit to save f.txt as the following files:
The actual stored content is “XE6 \ X97 \ A5”. If ultraedit interprets it as gb18030 code, it will be displayed as garbled code in ultraedit interface. After that, it was saved as a gb18030 encoded file, but utf-8 was opened on the mac system, showing normal.
The actual storage content is “6” and interpreted as utf-8, then it is displayed as “day”
If it is directly saved as gb18030 code, ultraedit will automatically change the code, that is, change “6” to “8”. After that, vim opens the file and calls ascii encoding interpretation.
Here comes the problem.
Since the actual stored data is “utf-8”, how can my editor be interpreted as UTF-8 encoding? What should I do if I want to get GBK’s explanation of the random code?
Is a tag added to the binary header of the document, and if so, how should this tag be viewed?
Is coding-based semantic analysis performed on the editor side?
Take vim for example
A text file is opened according to a certain code A when vim is opened, converted to a certain code B, and then converted to another code C when vim is saved. Other text editors are similar, and may not be able to be set and completed automatically like VIM.
Code B: It has no effect on the whole file, but only relates to the code used when vim interacts with the operating system.
Code a: use
set fileencodings=ucs-bom,utf-8,gbk,cp936,latin-1Settings. Vim checks the encoding of test files in the set order. Because there is no combination of some binary sequences in some codes, if it is detected, it is not considered this code, and the next code is checked, otherwise it is considered this one. Because ..
latin-1Any combination of binary sequences can occur, so if you put it in the first, it will always be
There is no character-coded mark in the general binary file. But Unicode has a special space called zero-width space (
\FF\FEThere is no encoding, so the character can be added artificially at the beginning of Unicode standard (this character has no width under any font, and has no effect in Chinese characters, which is set to take care of the display of some languages in Southeast Asia). This makes it easy for the text editor to check the character and byte order, but in the code
includeThis kind of file often has problems (this is a big hole, the compiler will think this is an illegal character, but you can’t see it).
set fileencoding=utf-8The code used when saving is automatically converted to another code when saving. However, if the wrong code is recognized when it is opened at the beginning, a non-existent character will not be converted completely when it is converted again.
Therefore, saving f1.txt as gp18030 may not perform encoding conversion.
“The problem is that I want to get the actual stored data as” gb18030 code “,but what should I do? “What does this mean?