Scrape Multipage Websites with Jina.ai and n8n

This workflow automates scraping entire multipage websites using Jina.ai without needing API keys. It streamlines extracting, filtering, and saving website content directly to Google Drive, saving hours of manual data collection.
manualTrigger
httpRequest
googleDrive
+9
Workflow Identifier: 1957
NODES in Use: manualTrigger, set, httpRequest, xml, splitOut, filter, limit, splitInBatches, code, googleDrive, wait, stickyNote
Scrape websites with n8n and Jina.ai

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What this workflow does

This workflow fetches many web pages from a sitemap URL.

It finds pages about “agent” or “tool”.

Then it scrapes those pages to get the title and markdown content using Jina.ai.

Finally, it saves each page as a markdown file in Google Drive.

This saves a lot of time because you do not need to copy and paste manually.


Who should use this workflow

This is for people who need to collect web information fast and often.

Mostly for content managers or researchers who track many AI tool pages.

No heavy technical skills needed but basic n8n use helps.


Tools and services used

  • n8n: To build and run the automation.
  • Jina.ai: For web page scraping without API key.
  • Google Drive: To save scraped markdown files automatically.
  • HTTP Request node: To get sitemaps and do scraping calls.
  • XML node: To parse sitemap XML into JSON.

Workflow input, processing, and output

Inputs

The workflow starts with a sitemap URL.

This URL leads to XML listing many pages on the target website.

Processing steps

  • Fetch the sitemap XML by HTTP Request.
  • Convert XML sitemap to JSON format.
  • Split JSON into list of URLs.
  • Filter URLs containing key words like “agent” or “tool” or equals main homepage URL.
  • Limit number of URLs to first 20.
  • Process URLs in batches to avoid overloading.
  • Call Jina.ai scraper for each URL to get page content.
  • Use code node to parse Jina.ai response to get title and markdown content cleanly.
  • Save content as markdown files with title and URL as file name in Google Drive.
  • Wait between batches to respect server rules.

Output

The workflow creates multiple markdown files in Google Drive.

Each file contains a scraped page with title and content markdown.


Beginner step-by-step: How to use this workflow in n8n

Step 1: Import workflow

  1. Download the workflow file using the Download button on this page.
  2. Open n8n editor where you want to run the automation.
  3. Click “Import from File” and select the downloaded workflow file.

Step 2: Configure credentials and settings

  1. Add Google Drive OAuth credentials if not connected yet.
  2. Check the sitemap URL in the Set node called Set Website URL. Update it if needed.
  3. If the Google Drive folder should change, update the folder ID in Save Webpage Contents to Google Drive node.
  4. Review the keyword filters in Filter By Topics or Pages node to match your topics.

Step 3: Test the workflow

  1. Click on Manual Trigger node labeled “When clicking ‘Test workflow’”.
  2. Run the workflow once. Watch for errors or empty results.

Step 4: Activate workflow for production

  1. Switch the workflow to Active mode in n8n.
  2. Schedule the trigger to run on a timer if regular scraping is needed.
  3. Monitor Google Drive folder to see new scraped markdown files.

If using self-host n8n, check self-host n8n for hosting options.


Customization ideas

  • Add more keywords in the filter node to capture different page topics.
  • Change the limit node to handle more or fewer URLs depending on needs.
  • Change Google Drive folder path to keep files organized.
  • Insert extra code nodes for additional content cleanup or analytics.

Edge cases and failures

404 error fetching sitemap

The sitemap URL might be wrong or sitemap not present.

Check URL access in browser before running workflow.

Empty or wrong data from Jina.ai scrape

Check that URLs sent to Jina.ai are complete and correct.

Test manually calling Jina.ai endpoint with a sample URL.

Google Drive file save errors

Verify Google Drive credentials in n8n have proper access rights.

Reauthenticate if permission issue occurs.


Summary

✓ Automates downloading lots of related pages from a site sitemap.

✓ Filters pages for relevance by keywords.

✓ Extracts structured title and markdown content using Jina.ai without API key.

✓ Saves each page as markdown file in Google Drive for easy access.

→ Saves user many hours of manual copying and checking pages.

→ Ensures data is complete, relevant, and easy to use for research or newsletters.


Scrape websites with n8n and Jina.ai

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

The workflow uses the website sitemap XML URL, fetches it, converts it to JSON, and extracts all URLs from it.
No, this workflow requires a sitemap URL to find pages. Modifications are needed to scrape without a sitemap.
Jina.ai scraper extracts page title and markdown content from each URL without needing an API key.
By updating the folder ID or path in the Google Drive node, the user controls where markdown files are saved.
Author
Written By
Ritu Sanjali

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.