Opening Problem Statement
Meet Sarah, a data analyst at a fast-growing marketing company. Every day, her team uploads new customer data CSV files to a shared Google Drive folder. Before sharing insights or sending reports, Sarah must manually scan these files for sensitive personal information (PII) like names, emails, or phone numbers. This tedious process takes hours, is prone to errors, and creates compliance risks when PII is accidentally leaked.
Sarahβs company handles dozens of CSV files weekly, and each manual review costs her well over 2 hours of lost time and potential legal headaches if a PII slip-up occurs. She needs a reliable, automated way to detect and remove PII columns promptly so her team can focus on analysis without risk.
What This Automation Does
This unique n8n workflow achieves exactly that. When a new CSV file appears in a specified Google Drive folder, the automation kicks in and:
- Automatically detects the file creation event in the monitored Drive folder.
- Downloads the CSV file content from Google Drive.
- Extracts tabular data from the CSV for analysis.
- Uses OpenAI GPT-4 to analyze the data headers and identify PII columns precisely.
- Programmatically removes the identified PII columns from the data.
- Generates and saves a clean, PII-free CSV file back to a separate Google Drive folder.
This entire process eliminates hours of manual work, mitigates risk of data breaches, and ensures compliance with data privacy standards effortlessly.
Prerequisites βοΈ
- Google Drive Account with appropriate API access to the folders used ππ
- OpenAI Account with API key access for GPT-4 or compatible model π
- n8n automation platform account (cloud or self-hosted) π
- CSV files uploaded to the specific Google Drive folder to trigger the workflow
Optional: You can self-host n8n for full control and security. If interested, services like Hostinger with n8n offer reliable options.
Step-by-Step Guide
1. Set up Google Drive Trigger to monitor new files
In n8n, create a new workflow and add the Google Drive Trigger node.
- Navigate to Node panel β Google Drive Trigger.
- Configure the node to watch for fileCreated events.
- Set polling to every minute for near real-time processing.
- Choose specificFolder mode, then specify your folder ID (e.g.,
1-hRMnBRYgY6iVJ_youKMyPz83k9GAVYu). - Connect your Google Drive OAuth2 credentials.
- After configuring, save and test by uploading a new CSV to the folder; the node should detect it.
Common Mistake: Forgetting to select the correct folder ID or Google Drive credentials leads to no triggers.
2. Download the newly created file from Drive
Add the Google Drive node next to the trigger.
- Set operation to download.
- Map the
fileIdfield to the ID output from the trigger node:{{$json.id}}. - Choose
dataas the binary property name to hold the file content. - Use the same Google Drive OAuth2 credentials.
- Test to confirm the workflow downloads the file successfully.
Common Mistake: Not mapping the fileId exactly from the trigger output prevents file download.
3. Extract tabular data from the downloaded file
Add the Extract from File node.
- Leave options default as it auto-detects CSV content.
- This node parses the CSV and converts it into structured JSON for processing.
- Test with your CSV file to see the extracted data in JSON format.
Common Mistake: Uploading a non-CSV or corrupt file will cause extract errors.
4. Analyze data headers to identify PII columns via OpenAI GPT-4
Add the OpenAI node (LangChain variant) configured with your API key.
- Select the model
gpt-4o-mini. - Insert system prompt to instruct the AI to analyze table headers and identify PII columns exactly.
- The prompt template echoes headers and example row values dynamically from the extracted JSON data.
- Output JSON is enabled to handle the AI response cleanly.
Prompt snippet:
Analyze the provided tabular data and identify the columns that contain personally identifiable information (PII). Return only the column names that contain PII, separated by commas.Common Mistake: Incorrect prompt or model ID causes poor PII detection.
5. Separate AI response message content for processing
Add a Split Out node named Get result.
- Configure it to extract the field
message.content.contentwhere the PII columns are returned. - This prepares the data to be merged later.
6. Extract original filename
Add another Split Out node named Get filename.
- Set it to split out the
namefield from the trigger output, saving asoriginalFilename. - This ensures the sanitized file has a similar name but with a suffix.
7. Merge AI response, filename, and extracted data
Add a Merge node with 3 inputs:
- Input 1: PII columns from OpenAI processed output
- Input 2: Original filename
- Input 3: Extracted CSV data rows
This consolidates all required pieces to remove PII effectively.
8. Remove PII columns with Code node
Add a Code node named Remove PII columns.
- Paste the following JavaScript code exactly to sanitize data:
const input = $input.all();
const firstItem = input[0];
if (!firstItem.json.data) {
throw new Error("PII column names are missing in the input data.");
}
const piiColumns = firstItem.json.data.split(',').map(col => col.trim());
let rows = input.slice(2).map(item => item.json);
if (rows.length === 0) {
throw new Error("No rows to convert to CSV.");
}
const sanitizedRows = rows.map(row => {
const sanitizedRow = { ...row };
piiColumns.forEach(column => delete sanitizedRow[column]);
return sanitizedRow;
});
const headers = Object.keys(sanitizedRows[0]);
const csvRows = [
headers.join(','),
...sanitizedRows.map(row =>
headers.map(header => String(row[header] || '').replace(/,/g, '')).join(',')
)
];
const csvContent = csvRows.join('n');
const originalFileName = input[1].json.originalFilename;
const fileExtension = originalFileName.split('.').pop();
const baseName = originalFileName.replace(`.${fileExtension}`, '');
const newFileName = `${baseName}_PII_removed.${fileExtension}`;
return [
{
json: {
fileName: newFileName,
content: csvContent
}
}
];
_PII_removed.Common Mistake: Typos or not slicing input properly causes errors or missing data.
9. Upload sanitized CSV back to Google Drive
Add a final Google Drive node set to createFromText operation.
- Map
fileNameandcontentfrom the code node output. - Choose the destination folder ID for processed files.
- Use the same Google Drive OAuth2 account.
- Test upload confirms the new sanitized file appears in your target folder.
Common Mistake: Forgetting to set folder ID or choosing “My Drive” instead of a folder causes uploads to wrong location.
Customizations βοΈ
- Change monitored folder: In the Google Drive Trigger node, update the
folderToWatchID to any folder your team uses. - Use different OpenAI model: In the OpenAI node, change the
modelIdto another GPT-4 variant or your preferred model. Ensure prompt still fits. - Adjust polling frequency: In the Google Drive Trigger node, change
pollTimesto trigger every 5 or 10 minutes to reduce API calls. - Modify filename suffix: In the Code node, update
newFileNameformat to add different suffixes or prefixes. - Save sanitized files to subfolders: Change
folderIdin the Upload to Drive node to any backup or archive folder.
Troubleshooting π§
Problem: “PII column names are missing in the input data.”
Cause: OpenAI node did not return proper CSV header response, or the Split Out node isnβt configured correctly.
Solution: Check OpenAI node output in execution logs. Confirm prompt validity and that Get result node extracts message.content.content exactly.
Problem: New files donβt trigger workflow
Cause: Incorrect Google Drive folder ID or credentials in the trigger node.
Solution: Reconfirm folder ID from Google Drive URL. Test credential authentication. Adjust polling frequency if needed.
Problem: Uploaded sanitized file is empty or malformed
Cause: Errors in the Code node or corrupted CSV input to the Extract from File node.
Solution: Validate CSV file structure, check JavaScript code syntax in the Remove PII columns node, and ensure data flows correctly in merge.
Pre-Production Checklist β
- Test uploading various sample CSV files to the watched Google Drive folder to ensure trigger detection.
- Monitor OpenAI responses to confirm correct PII column identification.
- Validate sanitized CSV outputs manually to verify PII columns are removed.
- Backup original files before automation or ensure Drive version history enabled.
- Verify all Google Drive credentials and permissions are current and adequate.
Deployment Guide
Once tested, activate the workflow in n8n by toggling it ON.
The automation will then monitor your Google Drive folder continuously (every minute by default), handle new CSV files, and upload sanitized versions seamlessly.
Incorporate workflow logs from n8n executions to monitor success and errors for ongoing maintenance.
FAQs
Can I use an API other than OpenAI?
While this workflow uses OpenAI GPT-4 for PII detection, you could adapt the AI analysis node to others supporting text classification with tabular data input. Ensure the output format matches what the code node expects.
Does running this workflow consume API credits?
Yes, every OpenAI API call consumes credits based on your subscription. Frequent polling also counts towards Google API usage. Adjust polling frequency to manage cost.
Is my data safe during processing?
Data is processed securely within n8n and OpenAI’s APIs. Avoid sharing confidential data outside trusted environments. Self-hosting n8n can enhance privacy protections.
Can this handle high volumes of CSV files?
As long as API rate limits and polling intervals are respected, n8n can handle moderate volumes. For heavy loads, scale by adjusting polling or batching inputs.
Conclusion
By following this guide, youβve built a powerful no-code automation that detects and removes PII from CSV files in Google Drive automatically using n8n and OpenAI. This saves Sarah and her team multiple hours weekly, drastically reduces compliance risks, and frees them to focus on insights rather than tedious data cleaning.
Next, consider automating the anonymized dataset distribution to analytics teams or triggering alerts when PII columns are found for auditing.
Keep experimenting with n8n workflowsβautomating data privacy can save your team valuable time and enhance your compliance posture effortlessly.