Automate Book Data Scraping, CSV Export & Email with n8n

Learn how to automate scraping book data from URLs, converting it to CSV, and emailing it using n8n. This workflow fetches book details, sorts by price, and sends results seamlessly to save hours in manual data collection.
googleSheetsTrigger
httpRequest
html
+4
Workflow Identifier: 1732
NODES in Use: Google Sheets Trigger, HTTP Request, HTML Extract, Split Out, Sort, Convert to File, Gmail

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Opening Problem Statement 📚

Meet Emily, a market researcher who frequently analyzes online bookstores for pricing trends and inventory data. Every week, Emily wastes 4-5 hours copying book titles, prices, and details from websites manually into spreadsheets for her reports. Not only is this tedious and time-consuming, but manual copying often leads to errors in data entry, causing delays and incorrect insights. Emily needs a way to automatically pull structured book information from any given URL, sort it by price, save it cleanly, and send it out to her team as easily accessible files. Without automation, Emily continues to lose valuable time and risks making costly mistakes in her research.

What This Automation Does ⚙️

This n8n workflow transforms the tedious task of manually scraping book data into a fully automated process. Here’s what happens when the workflow runs:

  • Detects new URLs added in a designated Google Sheets spreadsheet.
  • Fetches the entire webpage content of the bookstore URL using Dumpling AI’s scraping API to get clean HTML.
  • Extracts individual book entries from the webpage by targeting specific CSS selectors matching book list items.
  • Parses each book’s title and price from the HTML cleanly for structured data.
  • Sorts all books by price in descending order for easy price analysis.
  • Converts the data into a CSV file ready to download or share.
  • Emails the CSV file as an attachment automatically via Gmail to a pre-set recipient.

By automating these steps, Emily saves roughly 5 hours weekly and eliminates human errors from manual copy-paste tasks.

Prerequisites ⚙️

  • 📊 Google Sheets account with a sheet to track URLs
  • 🔐 Dumpling AI API credentials for scraping clean HTML content
  • 📧 Gmail account configured with OAuth2 for sending emails
  • 🔑 n8n account with access to Google Sheets Trigger, HTTP Request, HTML Extract, Sort, Convert to File, and Gmail nodes.
  • Optional: Self-hosting for n8n available for enterprise needs (learn more)

Step-by-Step Guide to Building This Workflow ✏️

1. Set Up Trigger to Watch New URLs in Google Sheets

Navigate in n8n to add a new node: Google Sheets Trigger.

Configure it to watch the specific spreadsheet and sheet where you’ll add book store URLs. For example, configure it with:

  • Document ID: Link to your Google Sheet
  • Sheet Name: Usually “Sheet1” or your custom sheet name
  • Event: Choose Row Added so the workflow starts on new URL entries
  • Poll Times: Set to every minute for near real-time triggering

Save and verify the trigger listens for new rows correctly. Common Mistake: Forgetting to grant n8n permission to access your Google Sheets account.

2. Scrape Website Content Using Dumpling AI API

Add an HTTP Request node and set it to make a POST request to https://app.dumplingai.com/api/v1/scrape.

Configure the JSON body like so:

{
  "url": "{{ $('Trigger- Watches For new URL in Spreadsheet').json.url }}",
  "format": "html",
  "cleaned": "True"
}

This requests the full cleaned HTML content of the provided URL using Dumpling AI service. Remember to add your required HTTP header authorization.

Expected Outcome: You receive the entire web page’s filtered HTML in the response.

Common Mistake: Not properly referencing the URL from the trigger node. You must use the correct expression to pull the new row’s URL.

3. Extract All Book Entries from the Page

Add an HTML Extract node for mass extraction.

Configure it to extract all elements that match the CSS selector .row > li which corresponds to the list of books on the page.

Choose to return an array of HTML snippets representing each book.

Outcome: You get a list of raw HTML segments, each for one book.

Common mistake: Using an incorrect CSS selector that misses book elements or captures unrelated data.

4. Split the Book Array Into Individual Items

Use the Split Out node to split the array of book HTML into separate items so each book goes individually to the next processing step.

Set the field to split out as books.

Outcome: Each workflow execution branch now holds one book for processing.

Common mistake: Forgetting to specify the correct field to split out results in errors down the line.

5. Extract Title and Price from Each Book

Add another HTML Extract node to process each individual book HTML block.

Set extraction keys:

  • title: Extract from h3 > a element’s title attribute.
  • price: Extract the text content of the .price_color class.

This converts messy HTML into clean JSON objects like { title: "Book Name", price: "£51.77" }.

Common Mistake: Trying to extract attributes or content from wrong selectors.

6. Sort Books by Price

Add the Sort node after re-aggregating the individual books.

Configure to sort descending on the price field.

Outcome: The data gets ordered from the highest to lowest price for better analysis.

Common Mistake: Sorting on a string field without converting prices to numeric may sort improperly.

7. Convert JSON Data to CSV File

Add the Convert to File node.

Set it to convert the sorted JSON array to CSV format.

Outcome: You generate a CSV file that can be easily downloaded or attached to emails.

8. Send the CSV File via Gmail

Add a Gmail node to send an email automatically.

Fill in the recipient email, subject line (e.g., “Bookstore CSV”), and body message.

Attach the CSV file from the previous node by selecting it in the attachments section.

Outcome: The intended recipient receives a neatly formatted CSV report without any manual emailing.

Common Mistake: Not setting up Gmail OAuth2 credentials properly, causing authentication failures.

Customizations ✏️

  1. Change the CSS Selector in the “Extract all books from the page” HTML node to scrape other types of list items, like different product types or a new website structure.
  2. Modify Sort Order by changing the Sort node field from “price” descending to ascending or sorting by title alphabetically for different data insights.
  3. Add Additional Fields to extract more information from each book, such as author or rating, by extending the extraction keys in the “Extract individual book price” node.
  4. Adjust Email Recipient in the Gmail node to send reports to different stakeholders dynamically using expressions based on the URL owner.
  5. Add Google Sheets Append node between extraction and CSV conversion to store raw data back to a spreadsheet for archival.

Troubleshooting 🔧

Problem: “No data received from Dumpling AI scraping node”
Cause: API key or endpoint misconfigured, or URL format incorrect.
Solution: Double-check credentials, headers, and URL in the HTTP Request node. Ensure Dumpling AI key is active.

Problem: “Gmail node authentication failed”
Cause: OAuth2 token expired or misconfigured.
Solution: Refresh or reauthorize the Gmail credentials in n8n settings.

Problem: “HTML extraction returns empty array”
Cause: Incorrect or outdated CSS selector.
Solution: Inspect the target website source to update the CSS selector in the HTML Extract node appropriately.

Pre-Production Checklist ✅

  • Verify Google Sheets Trigger correctly detects new rows with valid URLs.
  • Confirm Dumpling AI HTTP Request returns clean HTML content.
  • Test HTML Extract nodes capture the expected book elements and fields.
  • Ensure sorting orders data correctly by price.
  • Test email delivery with sample CSV files attached.

Deployment Guide

Once all nodes are configured, activate your workflow by toggling the active switch in n8n.

Monitor the workflow executions via the n8n dashboard to ensure URLs are processed properly.

Set up alerts or monitoring hooks if you expect high volume or need notification on failures.

FAQs

Q: Can I replace Dumpling AI with another scraper?
A: Yes, you can use any service that returns clean HTML output in a similar POST HTTP request node.

Q: Does sending emails consume Gmail API quotas?
A: Yes, ensure your Gmail account has sufficient quota and is authorized correctly.

Q: Is my data safe within this workflow?
A: n8n only processes data within your setup; external API security depends on Dumpling AI and Gmail providers.

Conclusion

By following this workflow, you’ve built an automated pipeline that takes URLs from a spreadsheet, scrapes book data cleanly using Dumpling AI, processes and sorts entries by price, then emails a nicely formatted CSV report. This saves hours of manual data entry, reduces errors, and speeds up reporting for market research or bookkeeping needs. As next steps, consider adding Google Sheets logging, enhancing extraction with additional book details, or integrating notification systems like Slack to alert when a new report is sent. You’ve taken a big step toward mastering automated data scraping and emailing with n8n!

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free