Crawler API

The Crawler API allows you to fetch and monitor content changes on websites by crawling their sitemaps. You can query for changes from a specific date and get insights into the evolution of a website's content.


Crawl Sitemap Change

This endpoint allows you to crawl a sitemap and retrieve content changes since a specific date. The request requires the URL of the robots.txt file, which contains the link to the sitemap.xml. You can specify the date from which changes should be tracked.

Request Body

The body of the request should contain the following JSON schema to define the query parameters:

  • Name
    url
    Type
    string
    Description

    The URL of the website.

  • Name
    max_depth
    Type
    number
    Description

    Maximum crawl depth (0 for unlimited). default is 3.

  • Name
    max_date
    Type
    Date
    Description

    The start date for crawling, formatted as 2024-12-30T14:42:19.768Z. Only content modified after this date will be returned.

  • Name
    max_pages
    Type
    Date
    Description

    Total pages to crawl (1–10,000). default is 1000.

  • Name
    proxy_country
    Type
    string
    Description

    Geolocation of the IP used to make the request. Only for Premium Proxies

Example:

{
  "url": "https://www.example.com/",
  "filter": "2024-12-30T14:42:19.768Z"
}

Request

POST
/v1/crawler/sitemap
curl -X POST https://api.autoscraper.pro/v1/crawler/sitemap \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{"url": "https://www.example.com", "filter": "2024-12-30T14:42:19.768Z"}'

Response

[
  {
    "loc": "https://www.example.com/sitemap.xml",
    "lastmod": "2024-01-05T15:00:00Z"
  },
  ……
]

Response

The following properties are included in the request and response for the sitemap crawling:

  • Name
    loc
    Type
    string
    Description

    The URL of the location

  • Name
    lastmod
    Type
    timestamp
    Description

    Timestamp of when the page or sitemap was last modified.


Use Cases

  • Competitor Price Monitoring

    Crawl competitor sites daily, feed product URLs to Scraper APIs for price extraction.

  • SEO Audits

    Map site structure to identify broken links or orphaned pages.

  • Archival Systems

    Recursively crawl and snapshot entire domains for compliance.


Why Choose This API?

  • Precision Over Brute Force:

    AI-guided crawling avoids infinite loops and irrelevant pages.

  • End-to-End Integration:

    Pass crawled URLs directly to Article Extraction or Screenshot APIs.

  • Zero-Blocking Guarantee:

    Residential proxies + Scraping Browser ensure 99.9% success rates.

Was this page helpful?

Previous
Crawler API