Crawler API

The Crawler API allows you to fetch and monitor content changes on websites by crawling their sitemaps. You can query for changes from a specific date and get insights into the evolution of a website's content.


POST/v1/crawler/sitemap

Crawl Sitemap Change

This endpoint allows you to crawl a sitemap and retrieve content changes since a specific date. The request requires the URL of the robots.txt file, which contains the link to the sitemap.xml. You can specify the date from which changes should be tracked.

Request Body

The body of the request should contain the following JSON schema to define the query parameters:

  • Name
    url
    Type
    string
    Description

    The URL of the website.

  • Name
    filter
    Type
    string
    Description

    The start date for crawling, formatted as 2024-12-30T14:42:19.768Z. Only content modified after this date will be returned.

Example:

{
  "url": "https://www.example.com/",
  "filter": "2024-12-30T14:42:19.768Z"
}

Request

POST
/v1/crawler/sitemap
curl -X POST https://api.autoscraper.pro/v1/crawler/sitemap \
  -H 'x-api-key: {API_KEY}' \
  -d '{"url": "https://www.example.com", "filter": "2024-12-30T14:42:19.768Z"}'

Response

[
  {
    "loc": "https://www.example.com/sitemap.xml",
    "lastmod": "2024-01-05T15:00:00Z"
  },
  ……
]

Response

The following properties are included in the request and response for the sitemap crawling:

  • Name
    loc
    Type
    string
    Description

    The URL of the location

  • Name
    lastmod
    Type
    timestamp
    Description

    Timestamp of when the page or sitemap was last modified.


Was this page helpful?