On RTF Format Analysis and doc (computing) Analysis

  java, question

Recently, I have been working on file parsing. Currently, open source POI is used to resolve doc and docx. However, in case of rtf format, an error will be reported. How can java judge whether a file with Filename extension as. doc is a word file or an rtf format file?

Check the file header, that is, the first few bytes of the file. The common MIME Type parsing is the same principle. Because your requirements are very simple, I do not recommend third-party encapsulation for MIME type determination here.
The first few bytes of RTF definition type are (hexadecimal) through search query: 7B 5C 72 74 66
Therefore, only the first five bytes of the file need to be read, then converted into a string in the form of hexadecial, and then compared with “7b5c727466” to judge whether it is RTF type.

FileInputStream fis = new FileInputStream(file);
 byte[] bytes = new byte[5];
 fis.read(bytes, 0, bytes.length);
 StringBuffer header=new StringBuffer();
 for (byte b : bytes) {
 String hex=Integer.toHexString(b);
 If(hex.length()<2){// supplement below two digits
 boolean isRTF="7b5c727466".contentEquals(header);