Remove PII from Google Drive CSVs with n8n & OpenAI

This workflow automates the removal of personally identifiable information (PII) from CSV files in Google Drive. By monitoring a specific folder, extracting PII columns via OpenAI, and saving sanitized files back to Drive, it drastically reduces manual data cleaning errors and saves hours each week.
googleDriveTrigger
openAi
code
+6
Learn how to Build this Workflow with AI:
Workflow Identifier: 1550
NODES in Use: Google Drive Trigger, Google Drive, Extract from File, OpenAI, Merge, Upload to Drive, Split Out, Code, Sticky Note

Press CTRL+F5 if the workflow didn't load.

Visit through Desktop for Best experience

Opening Problem Statement

Meet Sarah, a data analyst at a fast-growing marketing company. Every day, her team uploads new customer data CSV files to a shared Google Drive folder. Before sharing insights or sending reports, Sarah must manually scan these files for sensitive personal information (PII) like names, emails, or phone numbers. This tedious process takes hours, is prone to errors, and creates compliance risks when PII is accidentally leaked.

Sarah’s company handles dozens of CSV files weekly, and each manual review costs her well over 2 hours of lost time and potential legal headaches if a PII slip-up occurs. She needs a reliable, automated way to detect and remove PII columns promptly so her team can focus on analysis without risk.

What This Automation Does

This unique n8n workflow achieves exactly that. When a new CSV file appears in a specified Google Drive folder, the automation kicks in and:

  • Automatically detects the file creation event in the monitored Drive folder.
  • Downloads the CSV file content from Google Drive.
  • Extracts tabular data from the CSV for analysis.
  • Uses OpenAI GPT-4 to analyze the data headers and identify PII columns precisely.
  • Programmatically removes the identified PII columns from the data.
  • Generates and saves a clean, PII-free CSV file back to a separate Google Drive folder.

This entire process eliminates hours of manual work, mitigates risk of data breaches, and ensures compliance with data privacy standards effortlessly.

Prerequisites βš™οΈ

  • Google Drive Account with appropriate API access to the folders used πŸ“πŸ”‘
  • OpenAI Account with API key access for GPT-4 or compatible model πŸ”
  • n8n automation platform account (cloud or self-hosted) πŸ”Œ
  • CSV files uploaded to the specific Google Drive folder to trigger the workflow

Optional: You can self-host n8n for full control and security. If interested, services like Hostinger with n8n offer reliable options.

Step-by-Step Guide

1. Set up Google Drive Trigger to monitor new files

In n8n, create a new workflow and add the Google Drive Trigger node.

  • Navigate to Node panel β†’ Google Drive Trigger.
  • Configure the node to watch for fileCreated events.
  • Set polling to every minute for near real-time processing.
  • Choose specificFolder mode, then specify your folder ID (e.g., 1-hRMnBRYgY6iVJ_youKMyPz83k9GAVYu).
  • Connect your Google Drive OAuth2 credentials.
  • After configuring, save and test by uploading a new CSV to the folder; the node should detect it.

Common Mistake: Forgetting to select the correct folder ID or Google Drive credentials leads to no triggers.

2. Download the newly created file from Drive

Add the Google Drive node next to the trigger.

  • Set operation to download.
  • Map the fileId field to the ID output from the trigger node: {{$json.id}}.
  • Choose data as the binary property name to hold the file content.
  • Use the same Google Drive OAuth2 credentials.
  • Test to confirm the workflow downloads the file successfully.

Common Mistake: Not mapping the fileId exactly from the trigger output prevents file download.

3. Extract tabular data from the downloaded file

Add the Extract from File node.

  • Leave options default as it auto-detects CSV content.
  • This node parses the CSV and converts it into structured JSON for processing.
  • Test with your CSV file to see the extracted data in JSON format.

Common Mistake: Uploading a non-CSV or corrupt file will cause extract errors.

4. Analyze data headers to identify PII columns via OpenAI GPT-4

Add the OpenAI node (LangChain variant) configured with your API key.

  • Select the model gpt-4o-mini.
  • Insert system prompt to instruct the AI to analyze table headers and identify PII columns exactly.
  • The prompt template echoes headers and example row values dynamically from the extracted JSON data.
  • Output JSON is enabled to handle the AI response cleanly.

Prompt snippet:

Analyze the provided tabular data and identify the columns that contain personally identifiable information (PII). Return only the column names that contain PII, separated by commas.

Common Mistake: Incorrect prompt or model ID causes poor PII detection.

5. Separate AI response message content for processing

Add a Split Out node named Get result.

  • Configure it to extract the field message.content.content where the PII columns are returned.
  • This prepares the data to be merged later.

6. Extract original filename

Add another Split Out node named Get filename.

  • Set it to split out the name field from the trigger output, saving as originalFilename.
  • This ensures the sanitized file has a similar name but with a suffix.

7. Merge AI response, filename, and extracted data

Add a Merge node with 3 inputs:

  • Input 1: PII columns from OpenAI processed output
  • Input 2: Original filename
  • Input 3: Extracted CSV data rows

This consolidates all required pieces to remove PII effectively.

8. Remove PII columns with Code node

Add a Code node named Remove PII columns.

  • Paste the following JavaScript code exactly to sanitize data:
  • const input = $input.all();
    const firstItem = input[0];
    if (!firstItem.json.data) {
      throw new Error("PII column names are missing in the input data.");
    }
    const piiColumns = firstItem.json.data.split(',').map(col => col.trim());
    let rows = input.slice(2).map(item => item.json);
    if (rows.length === 0) {
      throw new Error("No rows to convert to CSV.");
    }
    const sanitizedRows = rows.map(row => {
      const sanitizedRow = { ...row };
      piiColumns.forEach(column => delete sanitizedRow[column]);
      return sanitizedRow;
    });
    const headers = Object.keys(sanitizedRows[0]);
    const csvRows = [
      headers.join(','),
      ...sanitizedRows.map(row =>
        headers.map(header => String(row[header] || '').replace(/,/g, '')).join(',')
      )
    ];
    const csvContent = csvRows.join('n');
    const originalFileName = input[1].json.originalFilename;
    const fileExtension = originalFileName.split('.').pop();
    const baseName = originalFileName.replace(`.${fileExtension}`, '');
    const newFileName = `${baseName}_PII_removed.${fileExtension}`;
    return [
      {
        json: {
          fileName: newFileName,
          content: csvContent
        }
      }
    ];
    
  • This script removes all PII columns from data rows and regenerates a CSV string with a new filename suffix _PII_removed.

Common Mistake: Typos or not slicing input properly causes errors or missing data.

9. Upload sanitized CSV back to Google Drive

Add a final Google Drive node set to createFromText operation.

  • Map fileName and content from the code node output.
  • Choose the destination folder ID for processed files.
  • Use the same Google Drive OAuth2 account.
  • Test upload confirms the new sanitized file appears in your target folder.

Common Mistake: Forgetting to set folder ID or choosing “My Drive” instead of a folder causes uploads to wrong location.

Customizations ✏️

  • Change monitored folder: In the Google Drive Trigger node, update the folderToWatch ID to any folder your team uses.
  • Use different OpenAI model: In the OpenAI node, change the modelId to another GPT-4 variant or your preferred model. Ensure prompt still fits.
  • Adjust polling frequency: In the Google Drive Trigger node, change pollTimes to trigger every 5 or 10 minutes to reduce API calls.
  • Modify filename suffix: In the Code node, update newFileName format to add different suffixes or prefixes.
  • Save sanitized files to subfolders: Change folderId in the Upload to Drive node to any backup or archive folder.

Troubleshooting πŸ”§

Problem: “PII column names are missing in the input data.”

Cause: OpenAI node did not return proper CSV header response, or the Split Out node isn’t configured correctly.

Solution: Check OpenAI node output in execution logs. Confirm prompt validity and that Get result node extracts message.content.content exactly.

Problem: New files don’t trigger workflow

Cause: Incorrect Google Drive folder ID or credentials in the trigger node.

Solution: Reconfirm folder ID from Google Drive URL. Test credential authentication. Adjust polling frequency if needed.

Problem: Uploaded sanitized file is empty or malformed

Cause: Errors in the Code node or corrupted CSV input to the Extract from File node.

Solution: Validate CSV file structure, check JavaScript code syntax in the Remove PII columns node, and ensure data flows correctly in merge.

Pre-Production Checklist βœ…

  • Test uploading various sample CSV files to the watched Google Drive folder to ensure trigger detection.
  • Monitor OpenAI responses to confirm correct PII column identification.
  • Validate sanitized CSV outputs manually to verify PII columns are removed.
  • Backup original files before automation or ensure Drive version history enabled.
  • Verify all Google Drive credentials and permissions are current and adequate.

Deployment Guide

Once tested, activate the workflow in n8n by toggling it ON.

The automation will then monitor your Google Drive folder continuously (every minute by default), handle new CSV files, and upload sanitized versions seamlessly.

Incorporate workflow logs from n8n executions to monitor success and errors for ongoing maintenance.

FAQs

Can I use an API other than OpenAI?

While this workflow uses OpenAI GPT-4 for PII detection, you could adapt the AI analysis node to others supporting text classification with tabular data input. Ensure the output format matches what the code node expects.

Does running this workflow consume API credits?

Yes, every OpenAI API call consumes credits based on your subscription. Frequent polling also counts towards Google API usage. Adjust polling frequency to manage cost.

Is my data safe during processing?

Data is processed securely within n8n and OpenAI’s APIs. Avoid sharing confidential data outside trusted environments. Self-hosting n8n can enhance privacy protections.

Can this handle high volumes of CSV files?

As long as API rate limits and polling intervals are respected, n8n can handle moderate volumes. For heavy loads, scale by adjusting polling or batching inputs.

Conclusion

By following this guide, you’ve built a powerful no-code automation that detects and removes PII from CSV files in Google Drive automatically using n8n and OpenAI. This saves Sarah and her team multiple hours weekly, drastically reduces compliance risks, and frees them to focus on insights rather than tedious data cleaning.

Next, consider automating the anonymized dataset distribution to analytics teams or triggering alerts when PII columns are found for auditing.

Keep experimenting with n8n workflowsβ€”automating data privacy can save your team valuable time and enhance your compliance posture effortlessly.

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free