How can Nokogiri solve the problem of web page scrambling?

  question, ruby

Recently, I was helping my classmates grab some things on a website, such as above clothing picture, title and price, etc.

This website is a bit strange, he didn’t declare charset in meta, then I used Nokogiri and didn’t specify a specific encoding.

It’s ok to grab some pictures and links, but when it comes to Chinese, it gets confused.

I read the official documents.,
Nokogiri can specify encoding, for exampledoc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')
I tried to specify some such as gbk, etc., but they were all invalid. .

How should this situation be resolved?

doc = Nokogiri::HTML(open(''),nil,'UTF-8')

=> #(Document:0x3fc3974355f4 {
name = “document”,
children = [
#(DTD:0x3fc397424bf0 { name = “html” }),
#(Element:0x3fc39741fc18 {
name = “html”,
attributes = [
#(Attr:0x3fc39740fa20 {
name = “xmlns”,
value = “
children = [
#(Text “\r\n”),
#(Element:0x3fc3973da190 {
name = “head”,
children = [
#(Text “\r\n”),
#(Element:0x3fc3973cf6dc {
name = “title”,
Children = [#(Text “Nine Days International A218- Internet Business Park”)]

Utf-8 resolves normally. . .