Integrating Scraping Browser with Puppeteer
The AutoScraper Scraping Browser is a fully managed, out-of-the-box scraping solution designed to simplify the web scraping process. It seamlessly integrates with Puppeteer and allows you to scrape websites using our powerful infrastructure and rotating residential IP pool of over 55 million IPs from 190+ countries, all while maintaining a 99.9% uptime.
This guide will show you how to integrate Scraping Browser into your Puppeteer script with just one line of code, providing a fast and easy way to enhance your scraping capabilities.
Installing the required libraries
Puppeteer-Core is a lightweight version of the Puppeteer library explicitly designed to connect to an existing browser instance instead of launching a new one.
You can install puppeteer-core by running:
npm install puppeteer-core
Quick Integration with Scraping Browser Without the SDK
If you already have a Puppeteer setup running smoothly, integrating with AutoScraper Scraping Browser is incredibly simple. For example, if your Puppeteer code looked like this:
import puppeteer from 'puppeteer-core';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());
await browser.close();
})();
All you need to do is modify one line and swap the launch() method with the connection URL of the AutoScraper Scraping Browser. With just that change, you get the same functionality but with the power of AutoScraper‘s infrastructure behind it. Here’s the updated code:
import puppeteer from 'puppeteer-core';
(async () => {
const browserWSEndpoint = 'wss://api.autoscraper.pro/browser?apiKey=YOUR_API_KEY&proxy_country=us';
const browser = await puppeteer.connect({ browserWSEndpoint });
const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());
await browser.close();
})();
And that’s it! With just a single change, you’re now using AutoScraper Scraping Browser, which is fully equipped with the scalability, IP rotation, and global access that comes with our service.
Practical Use Cases
Here are some common Puppeteer operations integrated with ZenRows Scraping Browser:
Navigation and Page Content Extraction
Navigate to a webpage, extract the content, and scrape data:
const page = browser.newPage();
console.log('Navigating...');
await page.goto('https://www.example.com');
console.log(await page.title());
console.log('Scraping page content...');
const html = await page.content();
console.log(html);
await browser.close();
Taking a Screenshot
Capture screenshots during navigation:
const page = await browser.newPage();
console.log('Navigating...');
await page.goto('https://www.example.com');
console.log(await page.title());
console.log('Taking screenshot...');
await page.screenshot({ path: 'example.png' });
console.log('Screenshot saved as example.png');
await browser.close();
Running Custom Code
const page = await browser.newPage();
console.log('Navigating...');
await page.goto('https://www.example.com');
const result = await page.evaluate(() => {
return document.title;
});
console.log('Page title:', result);
await browser.close();
Troubleshooting
If you encounter a Connection Refused error when attempting to connect to the Scraping Browser, it’s likely due to:
- API Key Issues: Ensure that you’re using the correct API key.
- Network Issues: Check your internet connection and firewall settings.
- WebSocket Endpoint: Make sure you’re connecting to the correct WebSocket URL (wss://browser.zenrows.com).
Timeout Errors
If Puppeteer times out when trying to load a page, consider:
- Slow Websites: Increase the timeout value in Puppeteer by passing a timeout option:
await page.goto('https://example.com', { timeout: 60000 }); // 60 seconds