Escape Characters - Secrets of CSV
We know that a CSV file contains data that consists of a bunch of fields separated by a comma, but the CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain a result character such as comma, double quotes, or less commonly newline character or carriage return character.
Simple CSV implementation may prohibit field values that contain a comma or other special characters. To get rid of that issue or to include that kind of result characters inside a cell, we will need an escape mechanism. In this session, we are going to learn how to use escaping characters around values that contain result characters.
Now, let's jump into the demo and take the example from the previous session where we extracted some information from the PDF document in CSV format using PDF.co cloud API. In the previous session, we extracted some of the information, such as account number, invoice number, company name, total amount, or invoice date, etc in CSV format using PDF.co API. Now, one thing you note is that this company name contained a comma in the name.
What happens if we put this name in the CSV field? Copy this company name and add that to our CSV file. Hereunder company name header, past our company name and save this file. Open the CSV file in Excel. As you can see that because of the comma in the company name, the "Inc" word is shifted to the next column. Because of this rest of the fields are also shifted to the right side.
How to fix this issue? It's very simple. Open the CSV file in the text editor. Here what you need to do is in this company name just add the value inside the double-quotes. Save this file and open it in Excel. Now you can see that all column values now reside in their respective header and here in the company name, even if the comma inside the value, is still in the correct column.
What happened here is once we embed the field inside a set of table quotes, Excel escaped these values. These double quotes are generally referred to as text qualifiers. Next, is what if my cell value contains this type of value, you see in this description column? Here you can see that I have added the percentage in double-quotes.
Sometimes we need to put the double quote to define the string inside the string itself. When we add double quotes, just as we did for this company name, then any little double quotes in the actual data like these double quotes will have to be escaped to avoid it being read as one of the enclosing double-quotes.
To fix this issue, there are two accepted ways of escaping double quotes in the CSV file. One is using a pair of consecutive double quotes to denote one literal double quote in the data, and the other way is by prefixing a double quote with an escaped character such as Backslash(\).
Add the pair of consecutive double quotes and wrap the entire field in another double quote. Save this file and open it in Excel. Now you can see that our field value is displaying a double quote in the description cell.
The basic rule is if the cell value contains a comma, a new line, or double quotes, then the string value should be enclosed in double-quotes. Second is, any double-quote character in the value should be escaped with another double quote. By placing double quotes, Excel knows that we are escaping the second double quotes and that means the double quotes are now treated as literal.
Other useful articles:
- CSV and Where It Is Used
- Essential Secrets of CSV
- Writing CSV - Secrets of CSV
- A Real-World Example of CSV Usage with PDF.co Cloud API
- Change Default CSV Separator Using Windows Culture Settings
- Escape Characters - Secrets of CSV
- Real-World Example of CSV Usage with Document Parser Template Editor
- Where Large CSV Files are Available and Where They Are Used