What this workflow does
This workflow collects startup funding news from TechCrunch and VentureBeat sitemaps.
The main problem solved is saving analysts time by automating extraction and research of funding details.
It outputs clean, structured funding data stored in Airtable.
The workflow fetches XML sitemaps, finds articles with funding keywords, scrapes article content, and uses AI to pull structured company data.
It also enriches data with company research and merges info into a final record.
Who should use this workflow
This workflow is great for market analysts tracking startup funding news.
Anyone who wants to save hours of manual reading and data entry benefit from it.
Users without deep programming skills can run it inside n8n with minimal setup.
Business intelligence teams and reporting specialists also gain from structured, up-to-date funding records.
Tools and services used
- n8n Automation Platform: Runs the workflow either on cloud or self-host n8n.
- HTTP Request Nodes: Fetch sitemaps and article HTML pages.
- XML Parse Nodes: Convert sitemap XML to JSON.
- Split Out Nodes: Separate individual article URLs.
- Filter Nodes: Select only articles mentioning funding keywords.
- HTML Parser Nodes: Extract article titles and body text.
- LangChain AI Nodes: Use AI models like Anthropic Claude 3.5 and Perplexity LLaMA to extract structured company data and perform research.
- Airtable: Stores the final structured datasets for easy access.
- API Keys: Needed for Airtable, OpenRouter, Anthropic, and Perplexity.
How this workflow works: Inputs, Process, and Output
Inputs
- Latest sitemap XML URLs from TechCrunch and VentureBeat.
- Article URLs found in sitemaps.
- Funding keyword filter like “raise”.
- API Keys for AI models and Airtable.
Processing Steps
- Fetch sitemap XMLs using HTTP Request nodes.
- Convert XML to JSON to list article URLs via XML Parse.
- Split JSON lists into single article URLs.
- Filter URLs and titles containing funding keywords.
- Download full article HTML pages.
- Extract clean text for titles and article body using HTML Parser.
- Merge articles from both sources into a single stream.
- Use AI (LangChain nodes) to parse unstructured text to detailed JSON data with company name, funding round, amount, investors, valuation, and URLs.
- Run an AI-based data cleaner that auto-fixes JSON output.
- Query AI to find company websites to enrich profiles.
- Prepare final JSON data combining all extracted and researched fields.
Output
The data is stored as records in Airtable.
Users get a clean, structured table with funding rounds, investors, amounts, companies, and detailed research ready to use.
Beginner step-by-step: How to use this workflow in n8n
Importing the Workflow
- Click the Download button provided on the workflow page to save the workflow file.
- Open n8n editor where you want to run the workflow.
- Use the option “Import from File” in n8n and select the downloaded workflow file.
Configuring API Keys and IDs
- Add Airtable API Key in the Airtable node.
- Add API Keys for OpenRouter, Anthropic, and Perplexity models in respective LangChain AI nodes.
- Update Airtable Base IDs, Table Names, or any folder/email references if needed according to your account.
Testing and Activation
- Trigger the workflow manually once to test data flow and API responses.
- Check intermediate node outputs to confirm data is correct.
- Fix any errors like missing fields or API issues.
- Activate the workflow to run on a schedule or with other triggers for production.
This method lets beginners run the whole sequence without building from zero.
It uses easy import, config, test, and activate steps inside the n8n editor.
Customizations ideas
- Change funding keyword filters to include terms like “closed Series” or “funded”.
- Add other tech news sources by fetching additional sitemaps with same parsing flow.
- Enhance AI extract prompt to pull CEO name, employee count, or product launches.
- Replace Airtable with tools like Google Sheets or Salesforce for data storage.
Troubleshooting common issues
- Problem: No articles pass the funding keyword filter.
Cause: Keyword case or spelling mismatch.
Fix: Try a broader keyword or check conditions. - Problem: AI nodes output malformed JSON.
Cause: AI sometimes sends invalid structures.
Fix: Ensure Auto-fixing Output Parser node is active. - Problem: Airtable node record creation fails.
Cause: Wrong API key or base/table info.
Fix: Confirm all credentials and table IDs.
Pre-production checklist
- Verify all API Keys for Airtable, OpenRouter, Anthropic, and Perplexity.
- Manually open sitemap URLs in a browser to confirm access.
- Run workflow manually and review nodes outputs.
- Confirm CSS selectors in HTML parser nodes still capture titles and article body.
- Test deep research subworkflow inputs and outputs separately.
Deploying to production
- Set a schedule trigger in n8n to run workflow regularly, like daily.
- Monitor execution logs and data records in Airtable.
- Back up Airtable data or connect with reporting dashboards.
Summary of benefits and output
✓ Saves hours or days of manual news reading and data entry.
✓ Filters and extracts only relevant funding news automatically.
✓ Uses AI to create clean, structured data on companies and funding.
✓ Enriches data with extra company research for smarter reporting.
→ Final output is organized funding data stored in Airtable,
ready to use in reports or dashboards.