Opening Problem Statement 📚
Meet Emily, a market researcher who frequently analyzes online bookstores for pricing trends and inventory data. Every week, Emily wastes 4-5 hours copying book titles, prices, and details from websites manually into spreadsheets for her reports. Not only is this tedious and time-consuming, but manual copying often leads to errors in data entry, causing delays and incorrect insights. Emily needs a way to automatically pull structured book information from any given URL, sort it by price, save it cleanly, and send it out to her team as easily accessible files. Without automation, Emily continues to lose valuable time and risks making costly mistakes in her research.
What This Automation Does ⚙️
This n8n workflow transforms the tedious task of manually scraping book data into a fully automated process. Here’s what happens when the workflow runs:
- Detects new URLs added in a designated Google Sheets spreadsheet.
- Fetches the entire webpage content of the bookstore URL using Dumpling AI’s scraping API to get clean HTML.
- Extracts individual book entries from the webpage by targeting specific CSS selectors matching book list items.
- Parses each book’s title and price from the HTML cleanly for structured data.
- Sorts all books by price in descending order for easy price analysis.
- Converts the data into a CSV file ready to download or share.
- Emails the CSV file as an attachment automatically via Gmail to a pre-set recipient.
By automating these steps, Emily saves roughly 5 hours weekly and eliminates human errors from manual copy-paste tasks.
Prerequisites ⚙️
- 📊 Google Sheets account with a sheet to track URLs
- 🔐 Dumpling AI API credentials for scraping clean HTML content
- 📧 Gmail account configured with OAuth2 for sending emails
- 🔑 n8n account with access to Google Sheets Trigger, HTTP Request, HTML Extract, Sort, Convert to File, and Gmail nodes.
- Optional: Self-hosting for n8n available for enterprise needs (learn more)
Step-by-Step Guide to Building This Workflow ✏️
1. Set Up Trigger to Watch New URLs in Google Sheets
Navigate in n8n to add a new node: Google Sheets Trigger.
Configure it to watch the specific spreadsheet and sheet where you’ll add book store URLs. For example, configure it with:
Document ID:Link to your Google SheetSheet Name:Usually “Sheet1” or your custom sheet nameEvent:Choose Row Added so the workflow starts on new URL entriesPoll Times:Set to every minute for near real-time triggering
Save and verify the trigger listens for new rows correctly. Common Mistake: Forgetting to grant n8n permission to access your Google Sheets account.
2. Scrape Website Content Using Dumpling AI API
Add an HTTP Request node and set it to make a POST request to https://app.dumplingai.com/api/v1/scrape.
Configure the JSON body like so:
{
"url": "{{ $('Trigger- Watches For new URL in Spreadsheet').json.url }}",
"format": "html",
"cleaned": "True"
}This requests the full cleaned HTML content of the provided URL using Dumpling AI service. Remember to add your required HTTP header authorization.
Expected Outcome: You receive the entire web page’s filtered HTML in the response.
Common Mistake: Not properly referencing the URL from the trigger node. You must use the correct expression to pull the new row’s URL.
3. Extract All Book Entries from the Page
Add an HTML Extract node for mass extraction.
Configure it to extract all elements that match the CSS selector .row > li which corresponds to the list of books on the page.
Choose to return an array of HTML snippets representing each book.
Outcome: You get a list of raw HTML segments, each for one book.
Common mistake: Using an incorrect CSS selector that misses book elements or captures unrelated data.
4. Split the Book Array Into Individual Items
Use the Split Out node to split the array of book HTML into separate items so each book goes individually to the next processing step.
Set the field to split out as books.
Outcome: Each workflow execution branch now holds one book for processing.
Common mistake: Forgetting to specify the correct field to split out results in errors down the line.
5. Extract Title and Price from Each Book
Add another HTML Extract node to process each individual book HTML block.
Set extraction keys:
title: Extract fromh3 > aelement’stitleattribute.price: Extract the text content of the.price_colorclass.
This converts messy HTML into clean JSON objects like { title: "Book Name", price: "£51.77" }.
Common Mistake: Trying to extract attributes or content from wrong selectors.
6. Sort Books by Price
Add the Sort node after re-aggregating the individual books.
Configure to sort descending on the price field.
Outcome: The data gets ordered from the highest to lowest price for better analysis.
Common Mistake: Sorting on a string field without converting prices to numeric may sort improperly.
7. Convert JSON Data to CSV File
Add the Convert to File node.
Set it to convert the sorted JSON array to CSV format.
Outcome: You generate a CSV file that can be easily downloaded or attached to emails.
8. Send the CSV File via Gmail
Add a Gmail node to send an email automatically.
Fill in the recipient email, subject line (e.g., “Bookstore CSV”), and body message.
Attach the CSV file from the previous node by selecting it in the attachments section.
Outcome: The intended recipient receives a neatly formatted CSV report without any manual emailing.
Common Mistake: Not setting up Gmail OAuth2 credentials properly, causing authentication failures.
Customizations ✏️
- Change the CSS Selector in the “Extract all books from the page” HTML node to scrape other types of list items, like different product types or a new website structure.
- Modify Sort Order by changing the Sort node field from “price” descending to ascending or sorting by title alphabetically for different data insights.
- Add Additional Fields to extract more information from each book, such as author or rating, by extending the extraction keys in the “Extract individual book price” node.
- Adjust Email Recipient in the Gmail node to send reports to different stakeholders dynamically using expressions based on the URL owner.
- Add Google Sheets Append node between extraction and CSV conversion to store raw data back to a spreadsheet for archival.
Troubleshooting 🔧
Problem: “No data received from Dumpling AI scraping node”
Cause: API key or endpoint misconfigured, or URL format incorrect.
Solution: Double-check credentials, headers, and URL in the HTTP Request node. Ensure Dumpling AI key is active.
Problem: “Gmail node authentication failed”
Cause: OAuth2 token expired or misconfigured.
Solution: Refresh or reauthorize the Gmail credentials in n8n settings.
Problem: “HTML extraction returns empty array”
Cause: Incorrect or outdated CSS selector.
Solution: Inspect the target website source to update the CSS selector in the HTML Extract node appropriately.
Pre-Production Checklist ✅
- Verify Google Sheets Trigger correctly detects new rows with valid URLs.
- Confirm Dumpling AI HTTP Request returns clean HTML content.
- Test HTML Extract nodes capture the expected book elements and fields.
- Ensure sorting orders data correctly by price.
- Test email delivery with sample CSV files attached.
Deployment Guide
Once all nodes are configured, activate your workflow by toggling the active switch in n8n.
Monitor the workflow executions via the n8n dashboard to ensure URLs are processed properly.
Set up alerts or monitoring hooks if you expect high volume or need notification on failures.
FAQs
Q: Can I replace Dumpling AI with another scraper?
A: Yes, you can use any service that returns clean HTML output in a similar POST HTTP request node.
Q: Does sending emails consume Gmail API quotas?
A: Yes, ensure your Gmail account has sufficient quota and is authorized correctly.
Q: Is my data safe within this workflow?
A: n8n only processes data within your setup; external API security depends on Dumpling AI and Gmail providers.
Conclusion
By following this workflow, you’ve built an automated pipeline that takes URLs from a spreadsheet, scrapes book data cleanly using Dumpling AI, processes and sorts entries by price, then emails a nicely formatted CSV report. This saves hours of manual data entry, reduces errors, and speeds up reporting for market research or bookkeeping needs. As next steps, consider adding Google Sheets logging, enhancing extraction with additional book details, or integrating notification systems like Slack to alert when a new report is sent. You’ve taken a big step toward mastering automated data scraping and emailing with n8n!