Opening Problem Statement
Meet Sarah, a data analyst working for a financial services company. Every week, Sarah receives data files in different complex formats such as Parquet, Avro, ORC, and Feather from various departments and external partners. Each file needs to be converted into JSON format to be ingested into her company’s analytics platform. However, manually converting these files using multiple tools is cumbersome, error-prone, and time-consuming, taking her several hours each week. This slows down her workflow, delays insights, and increases the chance for mistakes.
Sarah’s challenge is common among data professionals and developers dealing with big data or cross-system data sharing. Complex file formats like Parquet or Avro are efficient for storage and processing but difficult to transform into JSON without specialized software. Many solutions require coding or manual intervention, creating bottlenecks.
What This Automation Does
This specific n8n workflow is designed to automatically convert Parquet, Avro, ORC, and Feather files into JSON format using the ParquetReader API. When you upload a file, the workflow:
- Receives the file through a webhook trigger (supports multiple file types).
- Sends the file as multipart/form-data to the ParquetReader API for parsing.
- Receives the parsed JSON data, metadata, and schema from the API response.
- Parses the API response strings into usable JSON objects.
- Returns ready-to-use JSON data for further processing or integration.
Benefits include saving hours on complex file conversions, reducing manual errors, and enabling seamless integration of big data formats into JSON-based workflows or databases.
Prerequisites ⚙️
- n8n account (cloud or self-hosted). You can consider self-hosting options like Hostinger.
- Access to ParquetReader API (no special credentials needed for this example; it’s a public endpoint).
- Tools for triggering file uploads like curl, Postman, or another n8n workflow to call the webhook.
Step-by-Step Guide to Set Up and Use the Workflow
Step 1: Understanding the Webhook Trigger
Navigate in n8n to the workflow, and locate the Webhook node. This node serves as the entry point for your file conversion process.
- Go to your n8n dashboard → workflows → open “Convert Parquet, Avro, ORC & Feather via ParquetReader to JSON”.
- Click the Webhook node (type:
Webhook). - Note the URL path under parameters:
convert. This means your webhook URL ishttp://your-n8n-instance/webhook-test/convert. - HTTP method is
POST, and it’s configured to accept a binary file under the field namedfile.
Expected outcome: When a file is sent to this endpoint, the workflow triggers.
Common mistake: Forgetting to use the POST method or the correct form-data field name file will cause the workflow not to trigger.
Step 2: Upload a File to Trigger the Workflow
You can test the webhook quickly using command-line curl:
curl -X POST http://localhost:5678/webhook-test/convert
-F "[email protected]"
Replace converted.parquet with the path to your own Parquet, Avro, ORC, or Feather file.
Expected outcome: The workflow will capture the uploaded file and proceed to the next step.
Common mistake: Incorrect file path or unsupported file formats may cause errors.
Step 3: HTTP Request Node Sends File to ParquetReader API
This node named Send to Parquet API takes the incoming binary file and sends it to the ParquetReader API for parsing.
- Click the Send to Parquet API node (type:
HTTP Request). - Check that the URL is set to:
https://api.parquetreader.com/parquet?source=n8n. - Ensure method is
POST, and sendBinaryData is enabled with the binary property name set tofile0. - The payload content type is
multipart/form-datato properly send the file.
Expected outcome: The API receives the file and returns parsed JSON data, meta_data, and schema.
Common mistake: Not enabling sendBinaryData or incorrect binary property name causes failure or empty responses.
Step 4: Parsing the API Response with Code Node
The Parse API Response node (type: Code) converts some stringified JSON fields from the API into usable JSON objects.
- Click this node and open its JS code editor.
- The code snippet is:
const item = items[0];
// Convert `data` (stringified JSON array) → actual array
if (typeof item.json.data === 'string') {
item.json.data = JSON.parse(item.json.data);
}
// Convert `meta_data` (stringified JSON object) → actual object
if (typeof item.json.meta_data === 'string') {
item.json.meta_data = JSON.parse(item.json.meta_data);
}
return [item];
This ensures fields are properly parsed for use downstream.
Expected outcome: Final JSON output ready for integration.
Common mistake: Modifying this code without understanding breaks JSON parsing.
Customizations ✏️
- Support Additional File Formats: In the Webhook node, adjust the file validation or add conditional check nodes based on file type to handle more formats supported by ParquetReader.
- Use Another API Endpoint: Change the URL in the HTTP Request node to point to other supported file conversion APIs, adapting the content type accordingly.
- Save JSON Output to Storage: Add a node after “Parse API Response” to save the converted JSON to Google Drive, AWS S3, or any database using relevant n8n nodes.
- Trigger from Another Workflow: Call this workflow’s webhook from another n8n workflow via HTTP Request node to enable chained automations.
Troubleshooting 🔧
- Problem: “Webhook does not receive files or does not trigger.”
Cause: Incorrect HTTP method or missing multipart/form-data in request.
Solution: Confirm you use POST and include a form field named “file” containing the binary file. - Problem: “Empty or error response from ParquetReader API.”
Cause: Incorrect binary property name or API URL misconfiguration.
Solution: Check the HTTP Request node settings: binary property name must be “file0” and URL exact as provided. - Problem: “Code node JSON parse errors or crashes.”
Cause: API response changed or modified code incorrectly.
Solution: Revert to the original JS code and test with valid input files.
Pre-Production Checklist ✅
- Test the webhook URL with valid sample Parquet, Avro, ORC, and Feather files.
- Verify the HTTP Request node’s URL, method, and binary property match the API documentation.
- Ensure the Code node correctly parses the response without errors.
- Confirm the workflow triggers and completes within expected time frames.
- Backup your workflow before deploying it to production.
Deployment Guide
To deploy, activate the workflow in n8n by toggling it to Active.
Make sure your n8n instance is accessible on the public or internal URL to receive webhook triggers.
Monitor workflow executions via the n8n UI to catch any errors or slow responses from the API.
FAQs
- Q: Can I use this to convert huge files?
A: It depends on the API limits at parquetreader.com; consider splitting large files in advance or contacting API support. - Q: Do I need API keys?
A: This example uses a public API endpoint that requires no authentication, but check documentation for production use. - Q: Can I save the JSON output directly into databases?
A: Yes, you can extend the workflow with nodes like Google Sheets, PostgreSQL, or others supported by n8n.
Conclusion
With this workflow, you have automated the tedious task of converting multiple complex data formats—Parquet, Avro, ORC, Feather—into JSON, ready for data analysis or integration. It saves you hours of manual work every week and reduces errors caused by manual conversions.
By leveraging the ParquetReader API and n8n’s powerful workflow automation, Sarah and others like her can focus on deriving insights rather than wrestling with file formats.
Next steps could include automating JSON data storage, triggering machine learning pipelines, or integrating results directly into dashboards or business intelligence tools.
Give this workflow a try and radically simplify your data conversion process today.