Link Search Menu Expand Document

A Real-World Example of CSV Usage with PDF.co Cloud API

In the previous session, we had learned some basic concepts of CSV and also wrote one CSV file. Now it’s time to see the real-time use case of CSV.

ByteScout has provided PDF.co Restful web API for document manipulation, data extraction or data conversion, etc. Now to show you how CSV works in the real world, we will invoke one API of PDF.co in Postman.

PDF.co provides many APIs to manipulate PDF documents, from them let's implement this document Parser API which parses and gets data from PDF documents using a predefined custom data extraction template. With this API method, you may extract data from fields, tables, values from invoices, any kind of statements, orders, and other PDF or scan documents.

Here we can see the endpoint of the API and then next we need to set the registered API key in the request header. We will use Postman to run this API by the way you can download this API request collection from the Postman Collection URL to import in Postman. Open the Postman and try to understand some of the input parameters for this API in the Postman itself.

CSV Example

I have already set the required parameter of this API. Here is our API endpoint. Now let's understand the important parameter of this API one by one. First let us start with the URL, here is the URL which points to the actual PDF file. You can set the links from Google Drive, your Dropbox, or from built-in PDF.co file storage.

Now open this URL to see the PDF file from which we are going to extract the data. Here we can see the PDF file for which we will invoke that PDF.co API and this API will fetch the PDF field such as this invoice number, company name, account number, invoice date or total amount, etc. Let's see how this PDF.co API will fetch all this information.

CSV Examples

Next, we will set the template id, in our case, it is one (1). What exactly is this template? We have a sample template. This template is nothing but a YAML or JSON file which contains detection rules to extract specific data from the PDF document.

Example CSV

Next, we will set the output format for this API. The output format indicates the default output format of the API response, in our case, we will set it to CSV. The next parameter is to generate the CSV header, in our case we want to generate the CSV header.

We will set it to “true” and then last but not least setting the API key in the header part. I have already set my registered API key in this request header. Now it's time to invoke this API, for that click on the "Send" button to get our API output. Here we go, you can see here the API is JSON formatted response. Our actual data resides in this body object which contains PDF extracted data in CSV format.

Copy the content and paste it into one CSV file. Paste our API response in this file and save the file. Open the file location. Open the file in Microsoft excel and compare our extracted fields side by side with the actual PDF file.

As you can see here the company name is extracted and the same way invoice id is extracted from the PDF field. Here is the invoice date which you can see over here and the same way you can see the bank account number which is extracted from the PDF file. Here is the total invoice amount which is extracted from the PDF field too.

Examples CSV Format

This is how you can use this API as per your requirement. Now the next session is going to be very interesting for you. In the next session, we will see how and from where you can change the default separator for the CSV file in windows settings.

Other useful articles:


Back to top

© , Learn CSV — All Rights Reserved - Terms of Use - Privacy Policy