Automate Google Drive Duplicate File Cleanup with n8n

Struggling with duplicate files cluttering your Google Drive? This n8n workflow automatically detects duplicates and either flags or trashes them, saving you hours of manual cleanup. Learn how to deploy an effective duplicate file management system with easy configuration options.
googleDriveTrigger
code
googleDrive
+5
Workflow Identifier: 1222
NODES in Use: Set, Google Drive Trigger, Google Drive, Filter, Code, Switch, If, No Operation

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Opening Problem Statement

Meet Sarah, a busy digital marketer who manages numerous project files on Google Drive. Over time, she noticed her Google Drive getting congested with multiple copies of the same files — cluttering her folders, making it hard to find the right versions, and wasting precious time every day trying to manually identify and remove duplicates. This not only slows her workflow but also risks version confusion in collaborative projects.

Manually cleaning duplicates in Google Drive is tedious and error-prone. Sarah estimates she wastes at least 30 minutes daily hunting for duplicates and cleaning them up, translating to nearly 3 hours per week lost — time she could spend on creative and strategic tasks.

Wouldn’t it be great if this tedious cleanup could be automated with precise control? Enter the powerful n8n workflow that Sarah uses to automatically detect duplicate files in a specified Google Drive folder — then either flags them with a “DUPLICATE-” prefix or sends them to trash, depending on her chosen preference. Let’s dive into how this works and how you can implement it yourself.

What This Automation Does

This n8n workflow is specifically tailored to Google Drive file deduplication within a chosen folder. Once deployed, here’s exactly what happens when it runs:

  • The Google Drive Trigger polls a defined folder every 15 minutes for newly created files.
  • The workflow fetches all files owned by the specified user in that folder, excluding Google Apps files like Docs or Sheets which can’t be directly checked.
  • Based on the configuration, it sorts files either by newest or oldest first to decide which duplicate to keep.
  • It checks for duplicate files using the MD5 checksum — a reliable fingerprint of file content — marking duplicates accordingly.
  • Depending on the configuration, duplicates are either renamed with a “DUPLICATE-” prefix (flagged) or sent straight to the Google Drive trash.
  • Files already flagged aren’t flagged again, preventing repetitive processing.

Sarah no longer wastes time on manual duplicate cleanup and avoids accidental deletion of her latest or earliest edits. This automation saves her about 3 hours per week and drastically reduces version confusion.

Prerequisites ⚙️

  • Google Drive Account with files to deduplicate 📁
  • Google Drive OAuth2 credentials configured in n8n to authorize file operations 🔐
  • n8n account — cloud or self-hosted. For self-hosting, consider reliable providers like Hostinger for smooth operations 🔌 Learn more here
  • Basic knowledge of the Google Drive folder structure is helpful but not mandatory.

Step-by-Step Guide

Step 1: Set Up the Google Drive Trigger

In the n8n editor, click Add Node → search for and select Google Drive Trigger. Configure it as follows:

  • Event: fileCreated
  • Poll Times: every 15 minutes (default)
  • Folder to Watch: Choose the specific Google Drive folder ID you want to monitor, e.g., your project folder.

You should see the trigger linked and polling the folder, ready to detect new files. Common mistake is not selecting the correct folder ID, which leads to no files being detected.

Step 2: Configure Parameters with the Set Node

Add a Set node named Config to define parameters:

  • keep: Choose last to keep the latest file or first to keep the oldest.
  • action: Choose between flag to rename duplicates or trash to delete them.
  • owner: This is auto-filled with the file owner’s email from trigger.
  • folder: Folder ID obtained from trigger or manually set.

This node controls your deduplication behavior. Forgetting to set these or setting invalid values will cause the workflow to malfunction.

Step 3: Retrieve Files with Google Drive Node

Insert a Google Drive node configured as follows:

  • Operation: Search for files
  • Filter: Set folderId to the configured folder and use queryString to limit files owned by your specified owner.
  • Return All: enabled

This fetches all relevant files to scan for duplicates. Common errors include incorrect query syntax or folder ID.

Step 4: Exclude Google Apps files

Add a Filter node named Drop Google Apps files that:

  • Checks file MIME types and excludes those starting with application/vnd.google-apps, such as Docs, Sheets, or Slides.

This ensures only binary files are processed, which have MD5 hashes available.

Step 5: Choose Deduplication Method

Add a Switch node named Keep First/Last that branches on the keep parameter set earlier:

  • If keep is last, run the Deduplicate Keep Last Code node.
  • If keep is first, run the Deduplicate Keep First Code node.

Each Code node sorts files by creation date and marks duplicates using the MD5 checksum:

// Deduplicate Keep Last example snippet
const sorted = items.sort((a, b) => 
  new Date(b.json.createdTime) - new Date(a.json.createdTime));
const seen = {};
for (const item of sorted) {
  const md5 = item.json.md5Checksum;
  if (!md5) {
    item.json.isDuplicate = false;
    continue;
  }
  item.json.isDuplicate = md5 in seen;
  if (!item.json.isDuplicate) {
    seen[md5] = true;
  }
}
return items;

This logic robustly identifies duplicates. Missing or empty MD5 fields are safely ignored.

Step 6: Prepare Duplicate Metadata

The Edit Fields Set node copies and formats metadata fields, including the duplicate flag and essential file info for the next processing steps.

Step 7: Filter Only Duplicates

Add a Filter node set to pass only items where isDuplicate is true. This limits downstream actions to actual duplicates only.

Step 8: Decide Duplicate Handling – Trash or Flag

Insert another Switch node named Trash/Flag Duplicates based on the action parameter:

  • Trash branch: sends duplicates to the Send Duplicates to Trash Google Drive node that deletes the file.
  • Flag branch: runs an If node Is Flagged to check if files already start with “DUPLICATE-” to avoid multiple renaming, then renames duplicates by prefixing their name with “DUPLICATE-” via the Google Drive Update node.

Files with the “DUPLICATE-” prefix are skipped from further renaming to prevent loops.

Step 9: No Operation for Already Flagged Files

The No Operation node is a placeholder to stop processing flagged files from being renamed again.

Customizations ✏️

  • Change Deduplication Priority: In the Config Set node, modify the keep field between “first” and “last” to control which version stays.
  • Switch Between Trash and Flag: In Config, set action to “trash” to auto-delete duplicates instead of flagging them.
  • Expand Folder Scope: Remove or adjust the folder filter in the Working Folder Google Drive node to scan the entire drive instead of one folder.
  • Adjust Polling Frequency: In Google Drive Trigger, change the poll times to check more or less frequently to suit your workflow needs.

Troubleshooting 🔧

Problem: “No files detected by Google Drive Trigger”

Cause: Wrong folder ID or incorrect permissions may cause the trigger to miss new files.

Solution: Double-check the folder ID in the Google Drive Trigger node. Ensure the OAuth2 credentials have ‘file read’ permissions. Test trigger with a new file upload.

Problem: “Duplicates are not flagged or trashed properly”

Cause: Incorrect configuration in Set node parameters or malformed filename updates.

Solution: Verify the keep and action values in the Config node match expected options. Confirm the Google Drive Update node renaming pattern is correct and that duplicates already flagged start with “DUPLICATE-“.

Pre-Production Checklist ✅

  • Verify Google Drive OAuth2 credentials in n8n are active and have required scopes.
  • Confirm the folder ID in the trigger matches the exact folder you want to monitor.
  • Test the trigger by uploading a test file to confirm detection.
  • Run the workflow with sample files containing duplicates to validate correct identification and handling.
  • Backup important files before deployment in case of accidental deletion.

Deployment Guide

Activate the workflow in n8n once tested. Monitor the execution logs periodically for errors. If you chose “trash” as action, review Google Drive’s trash folder occasionally to restore mistakenly deleted files within Google Drive’s 30-day retention.

This workflow requires minimal maintenance but keeping an eye on logs ensures smooth operations.

FAQs

Q: Can this workflow handle nested folders?
A: By default, it only operates on one folder level. You can customize the Google Drive filter node to remove the folder limit if needed.

Q: Does it delete files permanently?
A: Files moved to trash remain for 30 days before permanent deletion, allowing recovery if needed.

Q: Can I use this with shared drives?
A: This workflow focuses on files owned by the configured user and in specific folders. Additional configuration is needed for shared drives.

Conclusion

By following this guide, you’ve created a tailored n8n automation that efficiently manages duplicate files in your Google Drive folder. Sarah now enjoys a cleaner workspace, saves approximately 3 hours per week, and reduces error risks in file management.

Next, you might explore automations that archive old project files or notify collaborators of changes. Keep improving your workflow with n8n’s powerful automation capabilities to stay productive and organized.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free