1. Opening Problem Statement
Meet Sarah, a compliance officer at a mid-sized marketing firm. Every day, Sarah receives multiple CSV reports containing customer data uploaded to a shared Google Drive folder. Her task? Identify and delete columns with sensitive personally identifiable information (PII) before forwarding these sanitized reports to external partners. This manual process takes her nearly 3 hours daily and is prone to human error, risking leaks of sensitive data that could lead to compliance violations and hefty fines.
Imagine the tediousness of opening each CSV, scanning dozens of columns, deleting name, email, phone, or other PII data, and saving the file back — all repeatedly. Meanwhile, sensitive data risks accidental exposure. Sarah’s struggle is not unique; many teams handling files with PII face similar challenges daily.
2. What This Automation Does
This n8n workflow automates Sarah’s painful manual process with precision and speed, triggered immediately when a new CSV file appears in a specified Google Drive folder. Here’s exactly what happens:
- Detects new CSV files uploaded to your Google Drive folder and triggers the workflow instantly.
- Downloads the CSV file from Google Drive for processing.
- Extracts and sends tabular headers to an OpenAI model to identify which columns contain PII.
- Removes those PII columns from the CSV data using a custom Code node.
- Generates a new sanitized CSV file with PII removed, appending “_PII_removed” to the filename.
- Uploads the cleaned file back into a dedicated Google Drive folder for processed files.
By automating this, Sarah can save 3+ hours of tedious work daily, reduce errors, and guarantee compliance without lifting a finger.
3. Prerequisites ⚙️
- n8n account (cloud or self-hosted) 🔌
- Google Drive account with API access enabled 📁🔑
- OpenAI account with API keys configured 🔐
- Basic familiarity with CSV files and folder structure in Google Drive 📊
For those interested in self-hosting n8n, Hostinger offers an easy solution: host n8n yourself with Hostinger.
4. Step-by-Step Guide to Build This Workflow
Step 1: Set up Google Drive Trigger to Watch Folder
In n8n, add the Google Drive Trigger node.
Navigate to the node settings and authenticate your Google Drive account.
Set the trigger event to “fileCreated”.
Choose “Trigger On Specific Folder” and select or enter the folder ID of your Drive folder to monitor.
Set polling to every minute for near real-time processing.
You should see your folder’s ID and name verified.
Common mistake: Forgetting to set the correct folder ID or permissions may prevent triggering.
Step 2: Download the Uploaded File from Google Drive
Add a Google Drive node and set the operation to download.
Map the fileId to the ID received from the trigger node ({{ $json.id }}).
Set the binaryPropertyName to “data” for file content.
Ensure the same Google Drive credentials are used.
This node fetches the actual CSV file for processing.
Step 3: Extract Tabular Data from the File
Add the Extract from File node.
Configure it to extract plain CSV content.
This node outputs the parsed tabular data and headers that will be analyzed.
No special input required besides the downloaded file representation.
Step 4: Use OpenAI to Identify PII Columns
Insert the OpenAI (LangChain) node.
Configure it with your OpenAI credentials.
Set the model to “gpt-4o-mini” for a balance of performance and speed.
In the messages section, provide a system instruction asking to analyze the provided tabular headers and identify only the PII-containing columns.
Use expressions referencing the CSV headers extracted:
{"role":"system","content":"Analyze the provided tabular data and identify the columns that contain personally identifiable information (PII). Return only the column names that contain PII, separated by commas."}and
{"content":"=Here is some tabular data with column headers and example rows. Headers: {{Object.keys($json)}} Example Row 1: {{Object.values($json)}}"}This instructs OpenAI to return a simple comma-separated list of PII columns.
Expected output is JSON with a text field listing the columns.
Step 5: Merge OpenAI Output with Filename and Data
Add two Split Out nodes to extract the filename and the OpenAI response separately.
Then merge these with the main data stream using a Merge node configured for 3 inputs.
This prepares consolidated data for PII removal.
Step 6: Remove the PII Columns via Code Node
Add a Code node named “Remove PII columns”.
Use this JavaScript code:
// Input: All items from the previous node
const input = $input.all();
// Step 1: Extract the PII column names from the first item
const firstItem = input[0];
if (!firstItem.json.data || !firstItem.json.data) {
throw new Error("PII column names are missing in the input data.");
}
const piiColumns = firstItem.json.data.split(',').map(col => col.trim());
// Step 2: Remove the first two items and process the rest
let rows = input.slice(2).map(item => item.json);
if (rows.length === 0) {
throw new Error("No rows to convert to CSV.");
}
// Step 3: Remove PII columns from each row
const sanitizedRows = rows.map(row => {
const sanitizedRow = { ...row };
piiColumns.forEach(column => delete sanitizedRow[column]);
return sanitizedRow;
});
// Step 4: Extract headers from sanitized rows
const headers = Object.keys(sanitizedRows[0]);
// Step 5: Convert rows to CSV
const csvRows = [
headers.join(','),
...sanitizedRows.map(row => headers.map(header => String(row[header] || '').replace(/,/g, '')).join(','))
];
const csvContent = csvRows.join('n');
const originalFileName = input[1].json.originalFilename;
const fileExtension = originalFileName.split('.').pop();
const baseName = originalFileName.replace(`.${fileExtension}`, '');
const newFileName = `${baseName}_PII_removed.${fileExtension}`;
return [ { json: { fileName: newFileName, content: csvContent } } ];
This code removes sensitive columns and prepares a new CSV content string and filename.
Step 7: Upload Sanitized File Back to Google Drive
Finish by adding another Google Drive node.
Set operation to “createFromText”.
Map the new filename and content from the Code node output.
Specify the destination folder ID where sanitized files should be saved.
Authenticate with your Google Drive account.
Run a test upload; the file should appear in your target folder.
5. Customizations ✏️
- Change Folder to Monitor: In the Google Drive Trigger node, update the folder ID to any folder you want watched.
- Adjust OpenAI Model: Swap the OpenAI model from “gpt-4o-mini” to a larger or cheaper model as per your API limits.
- File Type Filters: Add conditional logic before processing to restrict only CSV files by checking the file extension in the Google Drive Trigger output.
- Additional Sanitization: Enhance the Code node to mask rather than remove PII columns if needed, by replacing values instead of deleting fields.
- Notification Integration: Add a notification node (Slack, email) after upload to alert your team when files are sanitized.
6. Troubleshooting 🔧
- Problem: “PII column names are missing in the input data.”
Cause: The OpenAI node did not return expected PII columns.
Solution: Check if OpenAI prompt inputs match your CSV column headers and ensure the OpenAI API key is valid and has usage quota. - Problem: “No rows to convert to CSV.”
Cause: The CSV file might be empty or the Extract from File node failed.
Solution: Verify the CSV file is properly formatted and test the Extract from File node separately. - Problem: Google Drive upload fails.
Cause: Incorrect folder ID or insufficient permissions.
Solution: Double-check folder IDs and that your Google Drive API credentials have write access.
7. Pre-Production Checklist ✅
- Confirm Google Drive folder ID to watch is correct and accessible.
- Test OpenAI node independently with sample headers to verify PII detection.
- Upload a test CSV via Google Drive to trigger the workflow end-to-end.
- Ensure sanitized file appears in the specified output Drive folder with proper naming.
- Backup original CSV files regularly before running automated processes.
8. Deployment Guide
After composing this workflow in n8n, activate the workflow toggle to enable it.
Ensure your n8n instance remains online—cloud instances usually handle this automatically.
Monitor initial runs for errors in the n8n executions window.
Adjust logging or add notification nodes if error alerts are needed.
This workflow scales with file frequency but monitor API quota usage with OpenAI and Google Drive.
9. FAQs
- Can I use other AI models instead of OpenAI?
Yes, any AI capable of analyzing tabular data for PII can be integrated by replacing the OpenAI node and adjusting the prompt. - Does this workflow consume many API credits?
It makes one OpenAI call per file; costs depend on your file volume and model used. - Is my data safe?
All file processing happens within your n8n instance securely; ensure your API keys remain secure and use encrypted credentials. - Can this handle hundreds of files daily?
Depending on your API limits and n8n hosting, it can scale well but may require quota monitoring.
10. Conclusion
By following this detailed tutorial, you automated the detection and removal of PII from CSV files uploaded to Google Drive, freeing Sarah (and you) from hours of manual, error-prone work daily.
This workflow enhances data compliance, reduces risk, and expedites file processing, potentially saving multiple workdays each month.
Next, consider automating email notifications for processed files, integrating Slack alerts, or expanding to handle other file formats like Excel.
Get started today, and secure your data processing with smart automation!