AutoScraper API Reference

The Crawler API allows you to fetch and monitor content changes on websites by crawling their sitemaps. You can query for changes from a specific date and get insights into the evolution of a website's content.

Crawl Sitemap Change

This endpoint allows you to crawl a sitemap and retrieve content changes since a specific date. The request requires the URL of the robots.txt file, which contains the link to the sitemap.xml. You can specify the date from which changes should be tracked.

Request Body

The body of the request should contain the following JSON schema to define the query parameters:

Name
url
Type
string
Description
The URL of the website.
Name
max_depth
Type
number
Description
Maximum crawl depth (0 for unlimited). default is 3.
Name
max_date
Type
Date
Description
The start date for crawling, formatted as 2024-12-30T14:42:19.768Z. Only content modified after this date will be returned.
Name
max_pages
Type
Date
Description
Total pages to crawl (1–10,000). default is 1000.
Name
proxy_country
Type
string
Description
Geolocation of the IP used to make the request. Only for Premium Proxies

Example:

{
  "url": "https://www.example.com/",
  "filter": "2024-12-30T14:42:19.768Z"
}

Request

POST

/v1/crawler/sitemap

curl -X POST https://api.autoscraper.pro/v1/crawler/sitemap \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{"url": "https://www.example.com", "filter": "2024-12-30T14:42:19.768Z"}'

Response

[
  {
    "loc": "https://www.example.com/sitemap.xml",
    "lastmod": "2024-01-05T15:00:00Z"
  },
  ……
]

Response

The following properties are included in the request and response for the sitemap crawling:

Name
loc
Type
string
Description
The URL of the location
Name
lastmod
Type
timestamp
Description
Timestamp of when the page or sitemap was last modified.

Use Cases

Competitor Price Monitoring

Crawl competitor sites daily, feed product URLs to Scraper APIs for price extraction.
SEO Audits

Map site structure to identify broken links or orphaned pages.
Archival Systems

Recursively crawl and snapshot entire domains for compliance.

Why Choose This API?

Precision Over Brute Force:

AI-guided crawling avoids infinite loops and irrelevant pages.
End-to-End Integration:

Pass crawled URLs directly to Article Extraction or Screenshot APIs.
Zero-Blocking Guarantee:

Residential proxies + Scraping Browser ensure 99.9% success rates.