How does the workflow get the list of pages to scrape?

The workflow uses the website sitemap XML URL, fetches it, converts it to JSON, and extracts all URLs from it.

Can the workflow scrape pages without a sitemap URL?

No, this workflow requires a sitemap URL to find pages. Modifications are needed to scrape without a sitemap.

What does the Jina.ai scraper do in the workflow?

Jina.ai scraper extracts page title and markdown content from each URL without needing an API key.

How can the user make the saved files organized in Google Drive?

By updating the folder ID or path in the Google Drive node, the user controls where markdown files are saved.

Scrape Multipage Websites With Jina.ai And N8n

What this workflow does

This workflow fetches many web pages from a sitemap URL.

It finds pages about “agent” or “tool”.

Then it scrapes those pages to get the title and markdown content using Jina.ai.

Finally, it saves each page as a markdown file in Google Drive.

This saves a lot of time because you do not need to copy and paste manually.

Who should use this workflow

This is for people who need to collect web information fast and often.

Mostly for content managers or researchers who track many AI tool pages.

No heavy technical skills needed but basic n8n use helps.

Tools and services used

n8n: To build and run the automation.

Jina.ai: For web page scraping without API key.

Google Drive: To save scraped markdown files automatically.

HTTP Request node: To get sitemaps and do scraping calls.

XML node: To parse sitemap XML into JSON.

Workflow input, processing, and output

Inputs

The workflow starts with a sitemap URL.

This URL leads to XML listing many pages on the target website.

Processing steps

Fetch the sitemap XML by HTTP Request.

Convert XML sitemap to JSON format.

Split JSON into list of URLs.

Filter URLs containing key words like “agent” or “tool” or equals main homepage URL.

Limit number of URLs to first 20.

Process URLs in batches to avoid overloading.

Call Jina.ai scraper for each URL to get page content.

Use code node to parse Jina.ai response to get title and markdown content cleanly.

Save content as markdown files with title and URL as file name in Google Drive.

Wait between batches to respect server rules.

Output

The workflow creates multiple markdown files in Google Drive.

Each file contains a scraped page with title and content markdown.

Beginner step-by-step: How to use this workflow in n8n

Step 1: Import workflow

Download the workflow file using the Download button on this page.

Open n8n editor where you want to run the automation.

Click “Import from File” and select the downloaded workflow file.

Step 2: Configure credentials and settings

Add Google Drive OAuth credentials if not connected yet.

Check the sitemap URL in the Set node called Set Website URL. Update it if needed.

If the Google Drive folder should change, update the folder ID in Save Webpage Contents to Google Drive node.

Review the keyword filters in Filter By Topics or Pages node to match your topics.

Step 3: Test the workflow

Click on Manual Trigger node labeled “When clicking ‘Test workflow’”.

Run the workflow once. Watch for errors or empty results.

Step 4: Activate workflow for production

Switch the workflow to Active mode in n8n.

Schedule the trigger to run on a timer if regular scraping is needed.

Monitor Google Drive folder to see new scraped markdown files.

If using self-host n8n, check self-host n8n for hosting options.

Customization ideas

Add more keywords in the filter node to capture different page topics.

Change the limit node to handle more or fewer URLs depending on needs.

Change Google Drive folder path to keep files organized.

Insert extra code nodes for additional content cleanup or analytics.

Edge cases and failures

404 error fetching sitemap

The sitemap URL might be wrong or sitemap not present.

Check URL access in browser before running workflow.

Empty or wrong data from Jina.ai scrape

Check that URLs sent to Jina.ai are complete and correct.

Test manually calling Jina.ai endpoint with a sample URL.

Google Drive file save errors

Verify Google Drive credentials in n8n have proper access rights.

Reauthenticate if permission issue occurs.

Summary

✓ Automates downloading lots of related pages from a site sitemap.

✓ Filters pages for relevance by keywords.

✓ Extracts structured title and markdown content using Jina.ai without API key.

✓ Saves each page as markdown file in Google Drive for easy access.

→ Saves user many hours of manual copying and checking pages.

→ Ensures data is complete, relevant, and easy to use for research or newsletters.

Buldrr AI

Scrape Multipage Websites with Jina.ai and n8n

What this workflow does

Who should use this workflow

Tools and services used

Workflow input, processing, and output

Inputs

Processing steps

Output

Beginner step-by-step: How to use this workflow in n8n

Step 1: Import workflow

Step 2: Configure credentials and settings

Step 3: Test the workflow

Step 4: Activate workflow for production

Customization ideas

Edge cases and failures

404 error fetching sitemap

Empty or wrong data from Jina.ai scrape

Google Drive file save errors

Summary

Frequently Asked Questions

Learn by Category

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

Automate Ideogram Image Generation with Google Sheets & Gmail

Automate IT Support with Slack and OpenAI in n8n

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Generate On-Brand Blog Articles Using n8n and OpenAI

Browse by Apps

BULDRR AI

Scrape Multipage Websites with Jina.ai and n8n

What this workflow does

Who should use this workflow

Tools and services used

Workflow input, processing, and output

Inputs

Processing steps

Output

Beginner step-by-step: How to use this workflow in n8n

Step 1: Import workflow

Step 2: Configure credentials and settings

Step 3: Test the workflow

Step 4: Activate workflow for production

Customization ideas

Edge cases and failures

404 error fetching sitemap

Empty or wrong data from Jina.ai scrape

Google Drive file save errors

Summary

Frequently Asked Questions

Learn by Category

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

Automate Ideogram Image Generation with Google Sheets & Gmail

Automate IT Support with Slack and OpenAI in n8n

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Generate On-Brand Blog Articles Using n8n and OpenAI

Browse by Apps

Do you want to adopt AI Automation?