Crawler API
The Crawler API allows you to fetch and monitor content changes on websites by crawling their sitemaps. You can query for changes from a specific date and get insights into the evolution of a website's content.
Crawl Sitemap Change
This endpoint allows you to crawl a sitemap and retrieve content changes since a specific date. The request requires the URL of the robots.txt
file, which contains the link to the sitemap.xml
. You can specify the date from which changes should be tracked.
Request Body
The body of the request should contain the following JSON schema to define the query parameters:
- Name
url
- Type
- string
- Description
The URL of the website.
- Name
max_depth
- Type
- number
- Description
Maximum crawl depth (0 for unlimited). default is 3.
- Name
max_date
- Type
- Date
- Description
The start date for crawling, formatted as
2024-12-30T14:42:19.768Z
. Only content modified after this date will be returned.
- Name
max_pages
- Type
- Date
- Description
Total pages to crawl (1–10,000). default is 1000.
- Name
proxy_country
- Type
- string
- Description
Geolocation of the IP used to make the request. Only for Premium Proxies
Example:
{
"url": "https://www.example.com/",
"filter": "2024-12-30T14:42:19.768Z"
}
Request
curl -X POST https://api.autoscraper.pro/v1/crawler/sitemap \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{"url": "https://www.example.com", "filter": "2024-12-30T14:42:19.768Z"}'
Response
[
{
"loc": "https://www.example.com/sitemap.xml",
"lastmod": "2024-01-05T15:00:00Z"
},
……
]
Response
The following properties are included in the request and response for the sitemap crawling:
- Name
loc
- Type
- string
- Description
The URL of the location
- Name
lastmod
- Type
- timestamp
- Description
Timestamp of when the page or sitemap was last modified.
Use Cases
-
Competitor Price Monitoring
Crawl competitor sites daily, feed product URLs to Scraper APIs for price extraction.
-
SEO Audits
Map site structure to identify broken links or orphaned pages.
-
Archival Systems
Recursively crawl and snapshot entire domains for compliance.
Why Choose This API?
-
Precision Over Brute Force:
AI-guided crawling avoids infinite loops and irrelevant pages.
-
End-to-End Integration:
Pass crawled URLs directly to Article Extraction or Screenshot APIs.
-
Zero-Blocking Guarantee:
Residential proxies + Scraping Browser ensure 99.9% success rates.