Dynamic web scraping made easy with Node.js

Modern websites are complex, Long gone are the days when a simple HTTP request could grab all the data you needed. Today's web applications heavily rely on JavaScript to render content, fetch data asynchronously, and provide interactive experiences. This presents a significant challenge for traditional web scraping methods.

If you're a Node.js developer, you're likely looking for robust solutions to tackle these dynamic websites. Thankfully, powerful browser automation libraries like Puppeteer and Playwright exist. However, managing browser instances, handling proxies, dealing with CAPTCHAs, and scaling your scraping infrastructure can quickly become a major headache.

This is where ZenCrawl's Browser Scraping API comes in. It allows you to harness the power of real browsers (controlled by logic similar to Puppeteer or Playwright) without managing the underlying infrastructure. You simply send your scraping instructions, and ZenCrawl handles the rest, returning the rendered HTML or other specified data.

In this guide, we'll walk you through how to use Node.js along with ZenCrawl's Browser Scraping API, implementing the scraping logic using both Puppeteer and Playwright syntaxes.

Why Browser Automation is Essential for Modern Scraping

Traditional scrapers fetch the initial HTML source code. But if a website uses JavaScript to:

Load data after the initial page load (e.g., via AJAX/Fetch).
Render UI components dynamically (e.g., React, Vue, Angular apps).
Require user interaction (like clicks or scrolls) to reveal content.

Then the initial HTML source won't contain the data you need. Browser automation tools programmatically control a real browser, executing JavaScript just like a human user, ensuring you get the final, fully-rendered content.

Introducing ZenCrawl's Browser Scraping API

Instead of running headless browsers on your own servers (which requires significant resources and maintenance), the ZenCrawl Browser Scraping API lets you offload this task.

Simplicity: Send an API request with the target URL and browser instructions.
Scalability: ZenCrawl manages a pool of browsers, handling scaling automatically.
Reliability: Built-in proxy rotation and CAPTCHA handling significantly improve success rates.
Flexibility: Supports Puppeteer-like and Playwright-like code snippets for browser interaction.

You write the logic for what the browser should do, and ZenCrawl executes it in its managed environment.

Example

Prerequisites

Before we start, make sure you have:

Node.js and npm (or yarn): Installed on your system.
A ZenCrawl Account: Sign up to get your API Key.
Basic Knowledge of JavaScript/Node.js: Understanding async/await is crucial.

If necessary, create the folder in advance.

mkdir scraping-with-nodejs
cd scraping-with-nodejs
npm init -y

Using Puppeteer with ZenCrawl

Puppeteer is a popular Node.js library developed by Google for controlling Chrome/Chromium.

Case 1: Scrape Website title

This is almost the simplest example to extract the title of a website.

// This code runs inside typescript environment
// step-1.ts
import puppeteer from 'puppeteer-core';

(async () => {
  const connectionURL = 'wss://api.zencrawl.com/browser?apiKey=YOUR_API_KEY';
  const browser = await puppeteer.connect({ browserWSEndpoint: connectionURL });
  const page = await browser.newPage();
  await page.goto('https://example.com');
  console.log(await page.title());
  await browser.close();
})();

It is worth noting that due to the emergence of anti-crawler mechanisms, not all websites allow you to crawl. This is exactly the problem that ZenRows's Browser Scraping API is designed to solve. We have pre-built the ability to bypass anti-crawler measures, which makes it easy for you to crawl.

Case 2: Use Premium Proxies

Using ZenCrawl's pre-built advanced proxy is very simple, just add simple parameters:

// step-2.ts
import puppeteer from 'puppeteer-core';

(async () => {
  const connectionURL = 'wss://api.zencrawl.com/browser?apiKey=YOUR_API_KEY&proxy_premium=true&proxy_country=us';
  const browser = await puppeteer.connect({ browserWSEndpoint: connectionURL });
  const page = await browser.newPage();
  await page.goto('https://example.com');
  console.log(await page.title());
  await browser.close();
})();

If you want to change to another country, just adjust the value of the proxy_country parameter (ISO 3166 country codes).

Advanced scraping: Scrape data list

In the following examples we will show how to perform complex scraping tasks.

// step-3.ts
import puppeteer from 'puppeteer-core';

(async () => {
  const connectionURL = 'wss://api.zencrawl.com/browser?apiKey=YOUR_API_KEY&proxy_premium=true&proxy_country=us';
  const browser = await puppeteer.connect({ browserWSEndpoint: connectionURL });
  const page = await browser.newPage();
  await page.goto('https://quotes.toscrape.com/js/');

  // scrape data list
  const quotesData = await page.evaluate(() => {
    const quoteElements = document.querySelectorAll('div.quote');
    const quotes: { text: string; author: string; tags: string[] }[] = []; // 明确定义 quotes 数组的类型

    quoteElements.forEach((quoteElement) => {
      const textElement = quoteElement.querySelector('span.text');
      const authorElement = quoteElement.querySelector('small.author');
      const tagsElements = quoteElement.querySelectorAll('div.tags a.tag');
      const tags = Array.from(tagsElements)
        .map((tag) => (tag ? tag.textContent : null))
        .filter((tag) => tag !== null) as string[]; // 更严谨地处理 null 并断言类型

      if (textElement && authorElement) {
        const textContent = textElement.textContent?.trim();
        const authorContent = authorElement.textContent?.trim();

        if (textContent && authorContent) {
          quotes.push({
            text: textContent.slice(1, -1),
            author: authorContent,
            tags: tags,
          });
        }
      }
    });

    return quotes;
  });
  console.log(`[puppeteer]rows:`, quotesData);

  // close the browser session
  await browser.close();
})();

Execute the above code

➜  npx tsx step-3.ts
[puppeteer]rows: [
  {
    text: 'The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.',
    author: 'Albert Einstein',
    tags: [ 'change', 'deep-thoughts', 'thinking', 'world' ]
  },
  {
    text: 'It is our choices, Harry, that show what we truly are, far more than our abilities.',
    author: 'J.K. Rowling',
    tags: [ 'abilities', 'choices' ]
  },
  {
    text: 'There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.',
    author: 'Albert Einstein',
    tags: [ 'inspirational', 'life', 'live', 'miracle', 'miracles' ]
  },
  {
    text: 'The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.',
    author: 'Jane Austen',
    tags: [ 'aliteracy', 'books', 'classic', 'humor' ]
  },
  {
    text: "Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.",
    author: 'Marilyn Monroe',
    tags: [ 'be-yourself', 'inspirational' ]
  },
  {
    text: 'Try not to become a man of success. Rather become a man of value.',
    author: 'Albert Einstein',
    tags: [ 'adulthood', 'success', 'value' ]
  },
  {
    text: 'It is better to be hated for what you are than to be loved for what you are not.',
    author: 'André Gide',
    tags: [ 'life', 'love' ]
  },
  {
    text: "I have not failed. I've just found 10,000 ways that won't work.",
    author: 'Thomas A. Edison',
    tags: [ 'edison', 'failure', 'inspirational', 'paraphrased' ]
  },
  {
    text: "A woman is like a tea bag; you never know how strong it is until it's in hot water.",
    author: 'Eleanor Roosevelt',
    tags: [ 'misattributed-eleanor-roosevelt' ]
  },
  {
    text: 'A day without sunshine is like, you know, night.',
    author: 'Steve Martin',
    tags: [ 'humor', 'obvious', 'simile' ]
  }
]

All the above examples are in our Github repository: https://github.com/ZenCrawl/Examples

Key Considerations & Best Practices

Waiting Strategy: Simply waiting for a fixed time (waitForTimeout) is often unreliable. Use waitForSelector, waitForNavigation, waitForFunction, or Playwright's Locators with waitFor for more robust waiting based on specific elements or network activity appearing.
Error Handling: Wrap your browser_code logic in try...catch blocks to handle cases where elements might not be found. Also, handle potential errors from the ZenCrawl API call itself (network issues, invalid API key, etc.).
Selectors: Use specific and stable selectors (IDs, data attributes) whenever possible. Avoid relying on brittle CSS classes or complex XPath that might change frequently.
Return Data: Only return the specific data you need from the browser_code function to minimize payload size and processing time. Returning the entire HTML (page.content()) is possible but can be large.
Debugging: Debugging browser_code can be tricky. Develop and test your Puppeteer/Playwright logic locally first against the target site before putting it into the API request string. Add console.log statements within your browser_code - ZenCrawl may capture these in logs or error messages (check ZenCrawl documentation).
Ethics & Legality: Always check a website's robots.txt file and Terms of Service before scraping. Scrape responsibly, avoid overloading servers, and respect data privacy.

Conclusion

Scraping dynamic, JavaScript-heavy websites with Node.js doesn't have to be an infrastructure nightmare. By combining the flexibility of Node.js and the browser interaction logic of Puppeteer or Playwright with the managed infrastructure of ZenCrawl's Browser Scraping API, you can build powerful, scalable, and reliable web scrapers.

This approach lets you focus on defining what data to extract and how to interact with the page, while ZenCrawl handles the complexities of running browsers at scale.

Ready to tackle modern web scraping challenges? Get started with ZenCrawl today and explore the Browser Scraping API Documentation for more advanced features and options!