1. Opening Problem Statement
Meet Tom, a developer who spends hours manually collecting data from multiple sources like GitHub, Wikipedia, and mock APIs for his reports and projects. Navigating through paginated API responses, extracting specific data points, and transforming messy outputs into neat items are tedious and error-prone when done manually. For Tom, this repetitive juggling wastes precious development time and increases the risk of overlooked data, leading to delayed deliverables.
This exact scenario is what the n8n workflow we’ll explore today solves — automating complex HTTP request handling with pagination, HTML extraction, and item splitting, dramatically reducing manual data processing time and mistakes.
2. What This Automation Does
When you run this workflow, it performs multiple HTTP requests to different endpoints and processes the responses intelligently. Here’s what happens specifically:
- Fetches a list of mock albums from a test API and converts the raw JSON response into manageable individual items.
- Loads a random Wikipedia page and extracts the article title from its HTML content using a CSS selector.
- Retrieves GitHub stars of a specified user through the GitHub API, handling pagination seamlessly to collect all pages automatically.
- Splits bulky body contents from HTTP responses into separate list items for easier downstream processing.
- Uses conditional logic to determine when all paginated data has been fetched, ensuring complete data extraction without manual intervention.
- Provides real-time control by starting the workflow manually, letting you test and understand each step’s output interactively.
Combined, these outcomes free you from repetitive data retrieval and parsing tasks, saving hours and minimizing errors, especially for developers and data enthusiasts working with web APIs.
3. Prerequisites
- n8n Account (cloud or self-hosted)
- Access to GitHub API (public endpoints used here; for private data GitHub token required)
- Internet connection to query Wikipedia, jsonplaceholder.typicode.com, and GitHub APIs
- Basic familiarity with HTTP requests concepts helpful but not mandatory
Optional: If you want full control and privacy, consider self-hosting n8n. Learn how at buldrr.com/hostinger.
4. Step-by-Step Guide
Step 1: Start with the Manual Trigger
In the n8n editor, locate the Manual Trigger node named On clicking ‘execute’. This node lets you run the workflow on demand.
Navigation: Click the workflow canvas > Find the “On clicking ‘execute'” node.
Visual: You’ll see a green manual trigger node ready to start the flow.
Outcome: Workflow runs only when you click “Execute Workflow”.
Common mistake: Forgetting to trigger manually will keep the workflow idle.
Step 2: Set Initial Parameters
This Set node initializes three key variables:
page– controls the pagination page (starts empty)perpage– number of results per page, set to 15githubUser– the GitHub username to fetch starred repos from, eg. “that-one-tom”
Navigation: Click the Set node following the manual trigger.
Configuration: Under “Values,” add these variables exactly as above.
Outcome: Prepares parameters used in later HTTP requests.
Common mistake: Not providing a valid GitHub username will lead to empty or error responses.
Step 3: Fetch Mock Album Data
The HTTP Request – Get Mock Albums node queries https://jsonplaceholder.typicode.com/albums for sample album data.
Navigation: Click the node named HTTP Request – Get Mock Albums.
Settings: Method GET, URL as above.
Outcome: Gets a complete JSON array of albums.
Common mistake: Not setting response options correctly may lead to partial data.
Step 4: Create Item Lists from Album Data
The Item Lists – Create Items from Body node splits the JSON response into manageable list items.
Navigation: Select the node and confirm the field to split is set as body.
Visual: You’ll see multiple separate items from the original array.
Outcome: Easier processing of album entries downstream.
Common mistake: Splitting the wrong field will result in empty or incorrect outputs.
Step 5: Load Random Wikipedia Page
The HTTP Request – Get Wikipedia Page node fetches a random article page using the https://en.wikipedia.org/wiki/Special:Random URL.
Settings: GET method, with redirects followed automatically and response set to download the full HTML as a file.
Outcome: Raw HTML content ready for extraction.
Common mistake: Forgetting to enable “Follow Redirects” breaks the flow.
Step 6: Extract Article Title from Wikipedia HTML
Use the HTML Extract – Extract Article Title node to parse the binary HTML payload and extract the article title using the CSS selector #firstHeading.
Navigation: Double-click this node and check settings.
Outcome: You receive the page title in JSON format for later use.
Common mistake: Incorrect CSS selector yields no data.
Step 7: Fetch GitHub Stars with Pagination
The core of this workflow, HTTP Request – Get my Stars node makes GitHub API calls to fetch the starred repositories for the user defined in the set node.
This uses query parameters for page (initially 1) and per_page (15) to manage pagination.
Navigation: Edit node, URL set dynamically as https://api.github.com/users/{{$node["Set"].json["githubUser"]}}/starred.
Outcome: Returns a page of starred repos.
Common mistake: Not handling pagination will return only first page results.
Step 8: Split Response Body into Items
The Item Lists – Fetch Body node segments the star data JSON into individual items, enabling easier iteration.
Navigation: Confirm the field split is body.
Outcome: Prepares each star for conditional checking.
Common mistake: Skipping this splitting step complicates further processing.
Step 9: Check if More Pages Exist
Use the If – Are we finished? node to verify if the response’s body is empty, meaning no more starred repositories are left to fetch.
Navigation: The condition uses {{$node["HTTP Request - Get my Stars"].json["body"]}} to check emptiness.
Outcome: If empty, stops pagination; otherwise, continues.
Common mistake: Wrong condition logic keeps loop running endlessly.
Step 10: Increment Page Number for Pagination
When more pages exist, the Set – Increment Page node increases the page number by one to request the next batch in the next iteration.
Configuration snippet: page = {{$node["Set"].json["page"]++}}
Outcome: Pagination loop advances.
Common mistake: Incrementing the wrong variable or resetting page accidentally.
5. Customizations
- Change GitHub User: In the Set node, modify the
githubUservalue to fetch stars for any GitHub username you want. - Adjust Pagination Size: In the same Set node, update
perpageto control how many items per GitHub API request you retrieve. - Extract Different Wikipedia Data: In the HTML Extract – Extract Article Title node, change the CSS selector to capture other elements, like
.infoboxor.mw-parser-output p. - Add More Data Sources: Duplicate HTTP Request nodes and configure new URLs to fetch other APIs, then process as needed with item lists or HTML extract nodes.
6. Troubleshooting
- Problem: “HTTP Request returns empty or error response”
Cause: Incorrect URL, network issues, or missing query parameters.
Solution: Verify URLs, check your internet connection, and make sure query parameters like
pageandper_pageare correctly set. - Problem: “HTML Extract node returns no data”
Cause: CSS selector is wrong or the HTML response format changed.
Solution: Open the node, inspect the returned HTML in previous node, and update the CSS selector accordingly.
- Problem: “Pagination loop runs endlessly”
Cause: If condition never becomes true due to improper empty check or data format issues.
Solution: Adjust the If – Are we finished? node condition to correctly detect when the data array is empty. Test with smaller pagination values first.
7. Pre-Production Checklist
- Ensure the GitHub username in Set node is valid.
- Test the HTTP requests individually by right-click running each node.
- Confirm pagination increments and stops as expected.
- Validate the HTML extraction by previewing output data.
- Backup workflow JSON before large changes.
8. Deployment Guide
Activate this workflow by clicking the “Execute Workflow” button or by triggering the manual node. Monitor the execution through the n8n UI to see outputs step-by-step.
This workflow is designed to run interactively for data collection, but can be scheduled with a Cron node if automation on intervals is desired. Use n8n’s built-in logging to track progress and errors.
9. FAQs
- Q: Can I use this with private GitHub repositories?
A: Yes, but you need to authenticate with a GitHub credential node or personal access token. - Q: Does this workflow consume API rate limits?
A: Yes, GitHub API has rate limits; keep pagination size reasonable to avoid hitting limits. - Q: Can I extract other Wikipedia data besides the title?
A: Yes, change the CSS selector in the HTML extract node accordingly.
10. Conclusion
Congratulations! By building this n8n workflow, you’ve automated fetching and processing data from multiple HTTP-based sources like GitHub, Wikipedia, and mock APIs. You saved hours manually scraping, parsing, and paging through API data—all without code.
Now that you’re comfortable with HTTP requests, pagination handling, and HTML extraction, consider expanding this automation to:
- Automate social media data pulls (Twitter API, Instagram)
- Compile regular reports from external web sources
- Feed processed data directly into databases or dashboards
Keep experimenting with n8n’s rich nodes to transform tedious manual tasks into reliable, scalable automations.