Opening Problem Statement
Meet Sarah, a busy digital marketer who manages numerous project files on Google Drive. Over time, she noticed her Google Drive getting congested with multiple copies of the same files — cluttering her folders, making it hard to find the right versions, and wasting precious time every day trying to manually identify and remove duplicates. This not only slows her workflow but also risks version confusion in collaborative projects.
Manually cleaning duplicates in Google Drive is tedious and error-prone. Sarah estimates she wastes at least 30 minutes daily hunting for duplicates and cleaning them up, translating to nearly 3 hours per week lost — time she could spend on creative and strategic tasks.
Wouldn’t it be great if this tedious cleanup could be automated with precise control? Enter the powerful n8n workflow that Sarah uses to automatically detect duplicate files in a specified Google Drive folder — then either flags them with a “DUPLICATE-” prefix or sends them to trash, depending on her chosen preference. Let’s dive into how this works and how you can implement it yourself.
What This Automation Does
This n8n workflow is specifically tailored to Google Drive file deduplication within a chosen folder. Once deployed, here’s exactly what happens when it runs:
- The Google Drive Trigger polls a defined folder every 15 minutes for newly created files.
- The workflow fetches all files owned by the specified user in that folder, excluding Google Apps files like Docs or Sheets which can’t be directly checked.
- Based on the configuration, it sorts files either by newest or oldest first to decide which duplicate to keep.
- It checks for duplicate files using the MD5 checksum — a reliable fingerprint of file content — marking duplicates accordingly.
- Depending on the configuration, duplicates are either renamed with a “DUPLICATE-” prefix (flagged) or sent straight to the Google Drive trash.
- Files already flagged aren’t flagged again, preventing repetitive processing.
Sarah no longer wastes time on manual duplicate cleanup and avoids accidental deletion of her latest or earliest edits. This automation saves her about 3 hours per week and drastically reduces version confusion.
Prerequisites ⚙️
- Google Drive Account with files to deduplicate 📁
- Google Drive OAuth2 credentials configured in n8n to authorize file operations 🔐
- n8n account — cloud or self-hosted. For self-hosting, consider reliable providers like Hostinger for smooth operations 🔌 Learn more here
- Basic knowledge of the Google Drive folder structure is helpful but not mandatory.
Step-by-Step Guide
Step 1: Set Up the Google Drive Trigger
In the n8n editor, click Add Node → search for and select Google Drive Trigger. Configure it as follows:
- Event: fileCreated
- Poll Times: every 15 minutes (default)
- Folder to Watch: Choose the specific Google Drive folder ID you want to monitor, e.g., your project folder.
You should see the trigger linked and polling the folder, ready to detect new files. Common mistake is not selecting the correct folder ID, which leads to no files being detected.
Step 2: Configure Parameters with the Set Node
Add a Set node named Config to define parameters:
- keep: Choose
lastto keep the latest file orfirstto keep the oldest. - action: Choose between
flagto rename duplicates ortrashto delete them. - owner: This is auto-filled with the file owner’s email from trigger.
- folder: Folder ID obtained from trigger or manually set.
This node controls your deduplication behavior. Forgetting to set these or setting invalid values will cause the workflow to malfunction.
Step 3: Retrieve Files with Google Drive Node
Insert a Google Drive node configured as follows:
- Operation: Search for files
- Filter: Set
folderIdto the configured folder and usequeryStringto limit files owned by your specified owner. - Return All: enabled
This fetches all relevant files to scan for duplicates. Common errors include incorrect query syntax or folder ID.
Step 4: Exclude Google Apps files
Add a Filter node named Drop Google Apps files that:
- Checks file MIME types and excludes those starting with
application/vnd.google-apps, such as Docs, Sheets, or Slides.
This ensures only binary files are processed, which have MD5 hashes available.
Step 5: Choose Deduplication Method
Add a Switch node named Keep First/Last that branches on the keep parameter set earlier:
- If
keepislast, run the Deduplicate Keep Last Code node. - If
keepisfirst, run the Deduplicate Keep First Code node.
Each Code node sorts files by creation date and marks duplicates using the MD5 checksum:
// Deduplicate Keep Last example snippet
const sorted = items.sort((a, b) =>
new Date(b.json.createdTime) - new Date(a.json.createdTime));
const seen = {};
for (const item of sorted) {
const md5 = item.json.md5Checksum;
if (!md5) {
item.json.isDuplicate = false;
continue;
}
item.json.isDuplicate = md5 in seen;
if (!item.json.isDuplicate) {
seen[md5] = true;
}
}
return items;This logic robustly identifies duplicates. Missing or empty MD5 fields are safely ignored.
Step 6: Prepare Duplicate Metadata
The Edit Fields Set node copies and formats metadata fields, including the duplicate flag and essential file info for the next processing steps.
Step 7: Filter Only Duplicates
Add a Filter node set to pass only items where isDuplicate is true. This limits downstream actions to actual duplicates only.
Step 8: Decide Duplicate Handling – Trash or Flag
Insert another Switch node named Trash/Flag Duplicates based on the action parameter:
- Trash branch: sends duplicates to the Send Duplicates to Trash Google Drive node that deletes the file.
- Flag branch: runs an If node Is Flagged to check if files already start with “DUPLICATE-” to avoid multiple renaming, then renames duplicates by prefixing their name with “DUPLICATE-” via the Google Drive Update node.
Files with the “DUPLICATE-” prefix are skipped from further renaming to prevent loops.
Step 9: No Operation for Already Flagged Files
The No Operation node is a placeholder to stop processing flagged files from being renamed again.
Customizations ✏️
- Change Deduplication Priority: In the Config Set node, modify the
keepfield between “first” and “last” to control which version stays. - Switch Between Trash and Flag: In Config, set
actionto “trash” to auto-delete duplicates instead of flagging them. - Expand Folder Scope: Remove or adjust the folder filter in the Working Folder Google Drive node to scan the entire drive instead of one folder.
- Adjust Polling Frequency: In Google Drive Trigger, change the poll times to check more or less frequently to suit your workflow needs.
Troubleshooting 🔧
Problem: “No files detected by Google Drive Trigger”
Cause: Wrong folder ID or incorrect permissions may cause the trigger to miss new files.
Solution: Double-check the folder ID in the Google Drive Trigger node. Ensure the OAuth2 credentials have ‘file read’ permissions. Test trigger with a new file upload.
Problem: “Duplicates are not flagged or trashed properly”
Cause: Incorrect configuration in Set node parameters or malformed filename updates.
Solution: Verify the keep and action values in the Config node match expected options. Confirm the Google Drive Update node renaming pattern is correct and that duplicates already flagged start with “DUPLICATE-“.
Pre-Production Checklist ✅
- Verify Google Drive OAuth2 credentials in n8n are active and have required scopes.
- Confirm the folder ID in the trigger matches the exact folder you want to monitor.
- Test the trigger by uploading a test file to confirm detection.
- Run the workflow with sample files containing duplicates to validate correct identification and handling.
- Backup important files before deployment in case of accidental deletion.
Deployment Guide
Activate the workflow in n8n once tested. Monitor the execution logs periodically for errors. If you chose “trash” as action, review Google Drive’s trash folder occasionally to restore mistakenly deleted files within Google Drive’s 30-day retention.
This workflow requires minimal maintenance but keeping an eye on logs ensures smooth operations.
FAQs
Q: Can this workflow handle nested folders?
A: By default, it only operates on one folder level. You can customize the Google Drive filter node to remove the folder limit if needed.
Q: Does it delete files permanently?
A: Files moved to trash remain for 30 days before permanent deletion, allowing recovery if needed.
Q: Can I use this with shared drives?
A: This workflow focuses on files owned by the configured user and in specific folders. Additional configuration is needed for shared drives.
Conclusion
By following this guide, you’ve created a tailored n8n automation that efficiently manages duplicate files in your Google Drive folder. Sarah now enjoys a cleaner workspace, saves approximately 3 hours per week, and reduces error risks in file management.
Next, you might explore automations that archive old project files or notify collaborators of changes. Keep improving your workflow with n8n’s powerful automation capabilities to stay productive and organized.