Crawler API
The Crawler API allows you to fetch and monitor content changes on websites by crawling their sitemaps. You can query for changes from a specific date and get insights into the evolution of a website's content.
Crawl Sitemap Change
This endpoint allows you to crawl a sitemap and retrieve content changes since a specific date. The request requires the URL of the robots.txt
file, which contains the link to the sitemap.xml
. You can specify the date from which changes should be tracked.
Request Body
The body of the request should contain the following JSON schema to define the query parameters:
- Name
url
- Type
- string
- Description
The URL of the website.
- Name
filter
- Type
- string
- Description
The start date for crawling, formatted as
2024-12-30T14:42:19.768Z
. Only content modified after this date will be returned.
Example:
{
"url": "https://www.example.com/",
"filter": "2024-12-30T14:42:19.768Z"
}
Request
curl -X POST https://api.autoscraper.pro/v1/crawler/sitemap \
-H 'x-api-key: {API_KEY}' \
-d '{"url": "https://www.example.com", "filter": "2024-12-30T14:42:19.768Z"}'
Response
[
{
"loc": "https://www.example.com/sitemap.xml",
"lastmod": "2024-01-05T15:00:00Z"
},
……
]
Response
The following properties are included in the request and response for the sitemap crawling:
- Name
loc
- Type
- string
- Description
The URL of the location
- Name
lastmod
- Type
- timestamp
- Description
Timestamp of when the page or sitemap was last modified.