Opening Problem Statement
Meet Sarah, a graduate student and research assistant who spends countless hours every week manually sifting through new academic papers published on Hugging Face. Her task: find the most relevant AI papers, extract meaningful abstracts, analyze their key contributions, results, and technical details, and then organize this information neatly into her Notion workspace for easy reference. This tedious process eats into her valuable research time, introduces errors from manual copy-pasting, and leads to missed insights due to fatigue and inconsistent note-taking. Sarah estimates she wastes at least 4-5 hours weekly on these repetitive chores, which slows down her research progress significantly.
This is a real, specific pain point: many researchers and knowledge workers face the overwhelming volume of new papers daily without an efficient way to filter, analyze, and archive them systematically.
What This Automation Does
This n8n workflow provides a hands-free, fully automated solution for Sarah and others like her. When triggered on weekdays to align with new paper releases, it:
- Fetches the latest Hugging Face research papers published the previous day via their web interface.
- Extracts the URLs of new papers and filters out those already existing in a Notion database to avoid duplicates.
- Retrieves detailed paper pages and extracts abstracts and titles programmatically.
- Uses OpenAI (through the LangChain node) to analyze each abstract deeply, summarizing core introductions, extracting keywords, highlighting important data and results, and providing classification.
- Stores these enriched summaries and metadata neatly into a Notion database for organized future access.
- Handles batching and conditional logic to ensure smooth data processing and prevent duplicate entries.
This workflow eliminates manual searching and note-taking, saving Sarah an estimated 5 hours weekly. It reduces human error, improves the depth of analysis with AI support, and keeps research notes centralized and searchable in Notion.
Prerequisites ⚙️
- n8n account for workflow automation (self-hosting option available for advanced users)
- Hugging Face free account (optional, for access rights if needed)
- Notion account with API integration and a prepared database to store paper metadata
- OpenAI API key (used through n8n’s LangChain OpenAI node) for natural language understanding and summarization
Step-by-Step Guide
1. Set Up the Schedule Trigger Node
Navigate to Triggers and add a Schedule Trigger node. Configure it to run on weekdays (Monday to Friday) at 8 AM to align with when Hugging Face typically updates their papers.
Parameters: Interval set to weekly, trigger days set to Monday through Friday, trigger hour set to 8.
You should see the trigger scheduled and ready to fire automatically on those days.
Common Mistake: Forgetting to set the correct days, which can cause missing updates on weekends or extra runs on holidays.
2. Fetch Yesterday’s Papers from Hugging Face
Add an HTTP Request node (named “Request Hugging Face Paper”) connected to the trigger. Use the GET method to retrieve the papers page from https://huggingface.co/papers with a query parameter for the date set to yesterday dynamically like {{ $now.minus(1,'days').format('yyyy-MM-dd') }}.
This returns the HTML page listing papers published on the specified date.
Common Mistake: Incorrect date formatting or forgetting to use the dynamic date function results in retrieving the wrong set of papers.
3. Extract Paper URLs from the HTML
Use an HTML Extract node configured to pull links matching the paper entries. Target the selector .line-clamp-3 and extract the href attribute, returning an array of paper URLs.
This node outputs the raw URLs to be processed further.
4. Split URLs to Process Individually
Add a Split Out node to split the extracted URLs into individual items for batch processing.
This prevents one giant payload and allows processing one paper URL at a time.
5. Loop Over Each Paper Item
Use a Split In Batches node to chunk processed URLs for smoother flow and API rate limit handling. Recommended batch size is the default.
6. Check Each Paper’s Existence in Notion
Connect a Notion Get All node configured to search your Notion database for an existing page where the URL matches the Hugging Face paper link.
This step prevents duplicates from being stored.
7. Conditional Filtering to Skip Existing Papers
Use an If node to check if the Notion query returned any results. If yes (paper exists), skip processing; if no, continue fetching details.
8. Retrieve Detailed Paper Content
Use another HTTP Request node to fetch the full detail page of each new paper by appending the URL to “https://huggingface.co”.
9. Extract Abstract and Title
Use a second HTML Extract node configured to target CSS selectors .text-gray-700 for abstract and .text-2xl for title.
10. Analyze Abstract Using OpenAI LangChain Node
Add the OpenAI LangChain node to send the extracted abstract to GPT-4o (or specified model). Use a system prompt that instructs the AI to extract core introduction, keywords, data/results highlights, technical details, and classification, outputting in JSON format.
Example prompt snippet:
{
"role": "system",
"content": "Extract the following key details from the paper abstract:nnCore Introduction: Summarize the main contributions and objectives of the paper, highlighting its innovations and significance.nKeyword Extraction: List 2-5 keywords that best represent the research direction and techniques of the paper.nKey Data and Results: Extract important performance metrics, comparison results, and the paper's advantages over other studies.nTechnical Details: Provide a brief overview of the methods, optimization techniques, and datasets mentioned in the paper.nClassification: Assign an appropriate academic classification based on the content of the paper.nnOutput as json:n{...}"
}This enhances the paper metadata with AI-powered insights.
11. Store Processed Data into Notion
Use a Notion Create node to insert a new page into your Notion database with properties mapped, including URL, title, abstract (truncated to 2000 characters), scraping date (today), classification, technical details, data/results, keywords, and core introduction.
Customizations ✏️
- Adjust Schedule Timing: In the Schedule Trigger node, change the interval and trigger hours to match your preferred paper publication time.
This lets you sync pulls with when new papers appear. - Expand Extraction Details: Add more CSS selectors in the Extract Hugging Face Paper Abstract HTML node to retrieve author names or publication dates for richer metadata.
- Enhance AI Prompts: Modify the OpenAI prompt to extract additional insights like potential applications or limitations mentioned.
- Batch Size Control: Adjust batch size in the Split In Batches node for faster processing or API quota management.
- Notion Database Properties: Customize your Notion database schema to include additional fields like paper DOI or citation count and map them accordingly.
Troubleshooting 🔧
- Problem: “No papers found or empty URL array after extraction”
Cause: Selector .line-clamp-3 might have changed in the Hugging Face website HTML
Solution: Inspect the Hugging Face papers page HTML to confirm the current selector; update the CSS selector accordingly in the HTML extract node. - Problem: “Notion duplicate check always returns false negatives or positives”
Cause: URL format mismatch or incorrect Notion query filter setup
Solution: Verify the exact URL string stored in Notion and the filter condition in the Notion node; ensure string formatting matches precisely (e.g., prepending ‘https://huggingface.co’ properly). - Problem: “OpenAI API fails or times out”
Cause: Exceeded rate limits or invalid API key
Solution: Check API usage dashboard; rotate keys or add retry logic in n8n; ensure correct API key is configured in credentials.
Pre-Production Checklist ✅
- Verify Hugging Face URL endpoint and date parameter correctness.
- Confirm Notion database schema matches properties mapped in the workflow.
- Test API keys for OpenAI and Notion with simple calls before full run.
- Run workflow manually with test data to check if all nodes execute without errors.
- Backup your Notion database before injecting new pages to prevent data loss.
Deployment Guide
Activate the workflow in n8n by toggling it on. Make sure your credentials for OpenAI and Notion are valid and connected.
Monitor the workflow’s executions in n8n dashboard, checking for errors or failed runs, and review logs for troubleshooting.
You can set notifications or alerts for failures using additional nodes or third-party integrations if desired.
FAQs
- Can I use another AI model instead of GPT-4o?
Yes, you can replace the OpenAI LangChain node’s model ID with any supported OpenAI model in n8n. - Does this workflow consume many OpenAI API credits?
It depends on the number and length of abstracts processed. Limit your batch size or frequency to control costs. - Is my data safe in Notion?
Notion uses encrypted storage and secure API access, but always review your organization’s compliance policies. - Can this scale to hundreds of papers daily?
Yes, with batch processing and proper API quota management this workflow can handle large volumes.
Conclusion
By the end of this tutorial, you’ve built a robust automation that fetches new AI research papers from Hugging Face, analyzes their abstracts with OpenAI’s powerful language models, and organizes rich metadata inside Notion automatically.
This solution saves at least 5 hours per week, eliminates human error in data collection and note-taking, and provides enhanced insights that accelerate academic research.
Next, consider expanding this workflow to include automated citation fetching, integration with other academic sites like arXiv, or even personalized paper recommendation systems.
Now, go ahead and automate your academic research workflow with confidence!