How does MongoDB insert text records with double quotation marks correctly? ?

  mongodb, question

Import the test.csv file using the self-contained mongoimport.exe (the test contents are as follows):

name,pass
 test1,ztj"ile0
 test2,"audreyhepburn"
 test3,Xiaoya”””oge521
 test4,""520xiangbin

Question:
Use after importfind({name:/^test/})Query, found pass field all display errors (completely different from the original value in csv, displayed as null value or only half of the text, etc.)-How can MongoDB correctly insert the text record with double quotation marks? ?

Neither insert item by item nor batch import can insert records with double quotation marks, even if you use “\” escape, please pray for the great god!

According to CSV standards:

file = [header CRLF] record *(CRLF record) [CRLF]
 header = name *(COMMA name)
 record = field *(COMMA field)
 name = field
 field = (escaped / non-escaped)
 escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
 non-escaped = *TEXTDATA
 COMMA = %x2C
 CR = %x0D
 DQUOTE =  %x22
 LF = %x0A
 CRLF = CR LF
 TEXTDATA =  %x20-21 / %x23-2B / %x2D-7E

In your example,test1Andtest4It’s illegal. Although I didn’t confirm MongoDB is strictly following RFC 4180 to parse CSV, your file format is definitely a big problem.

Therefore, it is still recommended to standardize your CSV file with tools before importing it into the database. I don’t know how much data you have, but this is only simple text processing and the time consumption should be acceptable.

The following is a plan, although not perfect, but should be applicable to most situations:

# For each line except the first line:
 for line in file[1 ...]
 # Use the part before the first comma as name and the part after the comma as pass
 [1:name, 2:pass] = line.match /^([^,])+,(.*)/
 
 # if name and pass exist
 if name and pass
 # If pass does not begin and end with double quotation marks when the leading and trailing spaces are ignored, or if there is a single double quotation mark in the middle of pass, escape again
 unless pass.trim().match(/^".*"$/) and !  pass.match(/[^"]"[^"]/)
 # Duplicate double quotation marks
 pass = pass.replace /"/, '""'
 # Double quote before and after
 pass = '"' + pass + '"'
 
 console.log [name, pass].join ','

https://tools.ietf.org/html/rfc4180