Opening Problem Statement
Meet Sarah, a data analyst who often needs to extract information from websites like GitHub issues or suggest engaging activities based on participant preferences. Every day, she spends hours manually scraping webpages or juggling API calls using separate tools. This not only wastes valuable time but also leads to frequent mistakes in data formatting and integration. Sarah wishes for an automated way to harness AI that can intelligently interact with web data and APIs without building complex subworkflows or handling tedious response formatting.
This scenario is precisely what our n8n workflow tackles — empowering you to set up AI agents that can scrape webpages and call APIs seamlessly, cutting down your workflow nodes and boosting productivity.
What This Automation Does
When this workflow runs, it does the following:
- Fetches and scrapes the latest GitHub issues from the n8n repository by calling a web scraping API using the AI agent.
- Suggests personalized activities based on user input by querying an activity API with parameters like type and participant count.
- Processes AI responses via the OpenAI Chat Model to understand queries and craft actionable prompts dynamically.
- Uses the n8n Langchain Agent nodes for orchestrating AI model interactions alongside external HTTP API calls.
- Drastically reduces workflow complexity by replacing traditional subworkflows and manual response formatting steps with integrated AI tools.
- Enables customizable inputs via manual triggers and Set nodes that define chat prompts and API query parameters.
The benefits are clear — you can automate complex AI-driven web scraping and API interactions in a single workflow, saving hours of manual labor and reducing errors from fragmented processes.
Prerequisites ⚙️
- n8n account with access to the Langchain Agent nodes.
- OpenAI API account credentials configured inside n8n for AI language modeling.
- Firecrawl API key for web scraping capabilities (configured in HTTP header authentication).
- Bored API for activity suggestions accessible via a public endpoint.
- Basic familiarity with n8n editor to navigate nodes and set credentials.
- Optional: Self-hosting the n8n instance for full data control and scalability — consider Hostinger for reliable hosting.
Step-by-Step Guide
1. Adding the Manual Trigger
In n8n, start by dragging a Manual Trigger node onto the canvas and naming it “When clicking ‘Test workflow’”. This will allow you to manually kick off the workflow.
You should see a button labeled “Execute Workflow” when testing. This enables rapid iteration without setting external triggers.
Common mistake: Forgetting to connect subsequent nodes to this trigger will result in no action upon manual activation.
2. Setting Input Prompts with Set Nodes
Add two Set nodes named “Set ChatInput” and “Set ChatInput1”. In each, configure an assignment for a string variable “chatInput”:
- For “Set ChatInput”, enter:
Can get the latest 10 issues from https://github.com/n8n-io/n8n/issues? - For “Set ChatInput1”, enter:
Hi! Please suggest something to do. I feel like learning something new!
These inputs simulate user queries regarding GitHub scraping and activity suggestions, respectively.
Visual: You’ll see the assigned string values appear in the node output during execution.
Common mistake: Mistyping the variable name “chatInput” will cause downstream nodes to fail receiving correct input.
3. Processing Queries with AI Agents
Place two Langchain Agent nodes named “AI Agent” and “AI Agent1”. Configure them to use the input variable {{$json.chatInput}} as text and the “define” prompt type.
These nodes act as orchestrators, taking the user query and deciding how to handle it with AI language models and tools.
Common mistake: Not linking the input correctly or misconfiguring the prompt type can disrupt proper AI interactions.
4. Integrating OpenAI Chat Models
Add two OpenAI Chat Model nodes to serve as language model engines for the agents. Link “OpenAI Chat Model” to “AI Agent” and “OpenAI Chat Model1” to “AI Agent1”.
Ensure your OpenAI API credentials are selected under each. This setup allows natural language understanding and generation capabilities.
Common mistake: Using expired or missing API keys leads to authentication errors.
5. Web Scraping with HTTP Request Tool
Use the Webscraper Tool node configured to POST to https://api.firecrawl.dev/v0/scrape with parameters:
url: The target webpage passed dynamically by the agent (example: GitHub issues URL)pageOptions: JSON object to cleanse content (onlyMainContent: true, replaceAllPathsWithAbsolutePaths: true, removeTags: 'img,svg,video,audio')
This node calls the Firecrawl API to scrape webpage content optimized for your AI agent’s use.
Common mistake: Omitting authentication header or misformatting the JSON body causes failed requests.
6. Calling the Activity Suggestion API
Add the Activity Tool node to GET from https://bored-api.appbrewery.com/filter with query parameters type and participants. This lets your AI request suggested activities fitting user preferences.
Example: type=education, participants=1
Common mistake: Forgetting to send query parameters results in generic or empty API responses.
7. Connecting Nodes for Data Flow
Link the manual trigger node “When clicking ‘Test workflow’” to both “Set ChatInput” and “Set ChatInput1” nodes. From there, connect each to their respective AI Agent nodes, which connect further to their associated language model and tool.
This ensures two simultaneous AI-driven flows — one for scraping GitHub issues and one for activity suggestion.
Visual confirmation: When executing, you’ll see output data streams from each branch reflecting fetched and processed information.
8. Using Sticky Notes for Documentation
Leverage the Sticky Note nodes to add descriptive documentation right inside the workflow canvas. This is helpful for team members or future edits to understand:
- That this workflow is a remake of previous AI scraping and API calling designs.
- The specific changes applied such as replacing subworkflows and manual formatting.
- Instructions and helpful tips on the HTTP request node usage and community support links.
Customizations ✏️
- Adjust Activity API Parameters: In the “Activity Tool” HTTP Request node, modify the
typeorparticipantsquery parameters to get suggestions tailored to different group sizes or activity types. - Change Target Scraper URL: In “Set ChatInput”, update the URL string to any webpage you want scraped (e.g., blog posts or news sites). The AI agent will dynamically fetch and process that page.
- Expand AI Model Options: Switch the OpenAI Chat Model nodes to use GPT-4 or other versions by updating your API key permissions and node settings for improved language understanding or generation.
- Modify Response Parsing Logic: Although this workflow uses optimized responses, you can add Code nodes after scraping to fine-tune or reformat the data presentation as per your needs.
- Integrate Additional Tools: Add more Langchain HTTP Request tools for other APIs and link them to new AI Agent nodes to broaden the AI assistant’s capabilities beyond web scraping and activity suggestions.
Troubleshooting 🔧
Problem: “Authentication failed for Firecrawl API”
Cause: Your API key is missing, expired, or incorrectly placed in HTTP Header Auth credentials.
Solution: Go to the Credentials tab in n8n, find your Firecrawl API entry, verify the key is current and ensure it’s assigned correctly in the “Webscraper Tool” node under “HTTP Header Auth”.
Problem: “OpenAI API request rejected”
Cause: Invalid or expired OpenAI API credentials or rate limiting.
Solution: Check your OpenAI account status, update API credentials in n8n, and monitor OpenAI usage limits. Also, verify the correct API key is selected in the “OpenAI Chat Model” nodes.
Problem: “No output or empty response from Activity Tool”
Cause: Missing required query parameters or incorrect parameter names.
Solution: Review and correctly set the “type” and “participants” fields in the “Activity Tool” node parameters. Test with known valid values.
Pre-Production Checklist ✅
- Verify API keys for both Firecrawl and OpenAI are correctly configured and active.
- Test manual trigger and confirm both workflows execute and yield expected data.
- Confirm the HTTP POST body format in the Webscraper Tool aligns with API documentation.
- Check that AI Agents receive the correct input from Set nodes and send responses properly.
- Review Sticky Notes content to ensure process clarity for team handoffs.
Deployment Guide
After thorough testing, activate the workflow by toggling it from manual to automatic triggers if needed, or continue using manual trigger for on-demand use.
Monitor workflow executions in the n8n dashboard logs to catch any runtime errors or interruptions.
For scaling, consider hosting n8n on a dedicated server or cloud instance to handle higher API call volumes and concurrent runs.
FAQs
Q: Can I replace Firecrawl with another web scraping API?
A: Yes, as long as the API supports a similar POST request with JSON body format. You will need to adjust the HTTP Request node accordingly.
Q: Does calling OpenAI and Firecrawl APIs incur extra costs?
A: Yes, both services charge based on usage. Monitor your API credits to avoid unexpected charges.
Q: Is my data processed securely?
A: Data sent to OpenAI and Firecrawl is transferred via HTTPS, ensuring encrypted transmission. For sensitive data, consider self-hosting n8n.
Conclusion
By following this guide, you have built a sophisticated n8n workflow where AI agents intelligently scrape webpages and call APIs to provide actionable insights like GitHub issues or tailored activity suggestions.
This automation saves substantial time otherwise spent on manual data extraction and crafting API calls, while reducing errors thanks to integrated AI orchestration.
Next steps could be extending this workflow to include AI-generated reports, adding scheduling for periodic runs, or integrating other APIs such as social media monitoring tools for richer data intelligence.
Keep experimenting and evolving your AI automations, and enjoy the power of n8n combined with advanced AI agents! ⚙️