Introduction
API V1
The Scrapex.ai API (version 1) provides programmatic access to the Scrapex platform. The API is organized around RESTful HTTP endpoints. All requests and responses (including errors) are encoded in JSON format with UTF-8 encoding.
Whether you’re an API pro, a beginning developer or a Scrapex.ai partner, our extensive API is waiting for your imagination. Our API suite allows you to run scripts, access data and much more.
We have language bindings in Shell, Python, and JavaScript! You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.
Authentication
Authenticated Requests:
import requests
from requests.auth import HTTPBasicAuth
try:
r = requests.get('https://api.scrapex.ai/v1/user/self/',
auth=HTTPBasicAuth('YOUR_API_KEY', ''))
r.raise_for_status()
print(r.json())
except requests.exceptions.HTTPError as err:
print(err.response.json())
# With curl, you can just pass the correct header with each request
curl "https://api.scrapex.ai/v1/user/self" -u "YOUR_API_KEY:"
import fetch from "node-fetch";
(async () => {
try {
let res = await fetch("https://api.scrapex.ai/v1/user/self", {
method: "GET",
headers: {
Authorization: `Basic ${Buffer.from("YOUR_API_KEY" + ":" + "").toString(
"base64"
)}`,
},
});
let data = await res.json();
console.log(data);
} catch (e) {
console.log(e);
}
})();
Make sure to replace
YOUR_API_KEY
with your API key.
Scrapex uses API keys
to authenticate requests. You can view and manage your API keys in the account settings page.
Your API keys carry many privileges, so be sure to keep them secure! Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, and so forth.
Authentication to the API is performed via HTTP Basic Auth. Provide your API key as the basic auth username value. You do not need to provide a password.
All API requests must be made over HTTPS. Calls made over plain HTTP will be redirected to HTTPS. API requests without authentication will also fail.
On Demand Extract
Run an OnDemand Extract
import requests
from requests.auth import HTTPBasicAuth
try:
r = requests.get('http://api.scrapex.local/v1/scrapers/<SCRAPER_ID>/extract?url=<URL>',
auth=HTTPBasicAuth('YOUR_API_KEY', ''))
r.raise_for_status()
print(r.json())
except requests.exceptions.HTTPError as err:
print(err.response.json())
curl "https://api.scrapex.ai/v1/scrapers/<SCRAPER_ID>/extract?url=<URL>" -u "YOUR_API_KEY:"
import fetch from "node-fetch";
(async () => {
try {
let res = await fetch(
"http://api.scrapex.local/v1/scrapers/<SCRAPER_ID>/extract?url=<URL>",
{
method: "GET",
headers: {
Authorization: `Basic ${Buffer.from(
"<YOUR_API_KEY>" + ":" + ""
).toString("base64")}`,
},
}
);
let data = await res.json();
console.log(data);
} catch (e) {
console.log(e);
}
})();
The above command returns JSON structured like this:
{
"data": [
...
...
...
],
"errors": [
...
...
...
]
}
Given an URL, this endpoint runs an on-demand extract
HTTP Request
GET https://api.scrapex.ai/v1/scrapers/<SCRAPER_ID>/extract
URL Parameters
Parameter | Description |
---|---|
SCRAPER_ID | The ID of the scraper of interest |
QUERY Parameters
Parameter | Description | Required |
---|---|---|
url | An encoded URL string | YES |
Collection
Get records in a collection
import requests
from requests.auth import HTTPBasicAuth
try:
r = requests.get('https://api.scrapex.ai/v1/projects/project-store/<PROJECT_ID>/<COLLECTION_NAME>',
auth=HTTPBasicAuth('YOUR_API_KEY', ''))
r.raise_for_status()
print(r.json())
except requests.exceptions.HTTPError as err:
print(err.response.json())
curl "https://api.scrapex.ai/v1/projects/project-store/<PROJECT_ID>/<COLLECTION_NAME>" -u "YOUR_API_KEY:"
import fetch from "node-fetch";
(async () => {
try {
let res = await fetch(
"https://api.scrapex.ai/v1/projects/project-store/<PROJECT_ID>/<COLLECTION_NAME>",
{
method: "GET",
headers: {
Authorization: `Basic ${Buffer.from(
"YOUR_API_KEY" + ":" + ""
).toString("base64")}`,
},
}
);
let data = await res.json();
console.log(data);
} catch (e) {
console.log(e);
}
})();
The above command returns JSON structured like this:
{
"total_count": 2,
"count": 2,
"offset": 0,
"data": [
{
"id": 0,
"name": "..."
},
{
"id": 1,
"name": "..."
}
]
}
This endpoint retrieves retrieves all the collections associated with an project
HTTP Request
GET https://api.scrapex.ai/v1/projects/project-store/<PROJECT_ID>/<COLLECTION_NAME>
URL Parameters
Parameter | Description |
---|---|
PROJECT_ID | The ID of the project of interest |
COLLECTION_NAME | The name of the collection of interest |
Scripts
Run a script
import requests
from requests.auth import HTTPBasicAuth
try:
r = requests.post('http://app.scrapex.local/api/v1/scripts/<SCRIPT_ID>/start',
data={"params": {}},
auth=HTTPBasicAuth('YOUR_API_KEY', ''))
r.raise_for_status()
print(r.json())
except requests.exceptions.HTTPError as err:
print(err.response.json())
curl "http://app.scrapex.local/api/v1/scripts/<SCRIPT_ID>/start" -u "YOUR_API_KEY:" -X POST -H "Content-type: application/json" -d '{"params": {}}'
import fetch from "node-fetch";
(async () => {
let body = { params: {} };
try {
let res = await fetch(
"http://app.scrapex.local/api/v1/scripts/<SCRIPT_ID>/start",
{
method: "POST",
body: JSON.stringify(body),
headers: {
Authorization: `Basic ${Buffer.from(
"28a8a1d2d6e4ca31cd0d9b303bf771cf9c1be470" + ":" + ""
).toString("base64")}`,
},
}
);
let data = await res.json();
console.log(data);
} catch (e) {
console.log(e);
}
})();
The above command returns JSON structured like this:
{
"params": {},
"script_id": "<SCRIPT_ID>",
"run_id": "<RUN_ID>"
}
This endpoint starts a script
HTTP Request
GET https://api.scrapex.ai/v1/scripts/<SCRIPT_ID>/start
URL Parameters
Parameter | Description |
---|---|
SCRIPT_ID | The ID of the script to be run |
BODY Parameters
Parameter | Description | Required |
---|---|---|
params | An object containing the script params | YES |
Check status of a script
import requests
from requests.auth import HTTPBasicAuth
try:
r = requests.get('http://app.scrapex.local/api/v1/scripts/<SCRIPT_ID>/runs/<RUN_ID>',
auth=HTTPBasicAuth('YOUR_API_KEY', ''))
r.raise_for_status()
print(r.json())
except requests.exceptions.HTTPError as err:
print(err.response.json())
curl "https://api.scrapex.ai/v1/scripts/<SCRIPT_ID>/runs/<RUN_ID>" -u "YOUR_API_KEY:"
import fetch from "node-fetch";
(async () => {
let body = { params: {} };
try {
let res = await fetch(
"http://app.scrapex.local/api/v1/scripts/<SCRIPT_ID>/runs/<RUN_ID>",
{
method: "GET",
headers: {
Authorization: `Basic ${Buffer.from(
"28a8a1d2d6e4ca31cd0d9b303bf771cf9c1be470" + ":" + ""
).toString("base64")}`,
},
}
);
let data = await res.json();
console.log(data);
} catch (e) {
console.log(e);
}
})();
The above command returns JSON structured like this:
{
"id": "<ID>",
"script_id": "<SCRIPT_ID>",
"metadata": {
"content-type": "application/json"
},
"response": {},
"status": 12,
"console_logs": [],
"ts_mod": "2022-01-12 10:43:49.924191+05:30",
"ts": "2022-01-12 10:43:42.112979+05:30",
"account_id": "6d84f178-20d2-11eb-beda-8752d362e34c",
"ts_start": "2022-01-12 10:43:42.19+05:30",
"ts_end": "2022-01-12 10:43:49.923+05:30",
"type": 0,
"script_job_id": null
}
This endpoint retrieves a specific runs's details.
HTTP Request
GET https://api.scrapex.ai/v1/scripts/<SCRIPT_ID>/runs/<RUN_ID>
URL Parameters
Parameter | Description |
---|---|
SCRIPT_ID | The ID of the script of interest |
RUN_ID | The ID of the run of interest |
Javascript API
Overview
Scripts are executable crawlers that can perform some pre-configured automation along with scraping data from websites. Scripts can be accessed from the Scripts Table under Project details page under the Projects page.
Following are the attributes of a script:
- Script ID - unique ID for identifying each script
- Script Name - Customizable name for a script
- Script References - References to existing preset scrapers in scrapex that can be invoked in scripts using specific aliases
- Script Params - Run time parameters that have preset default (overridden) value.
- User Script - The driver logic for the user script.
User script structure
let {newPage, end, except, extract, extractAndSave, store, runStore, waitFor} = __sandbox;
let {params, } = OPTIONS;
(async () => { try { //---> prefix
// -- START --
const page = await newPage()
// CUSTOM LOGIC ON page ---> user script
// -- END --
end()
} catch(e) { except(e) } })(); // ---> suffix
Editing the user script can be achieved by using the script editor page. Only the logic part of the script is editable while suffix and prefix will be read only.
Sample User Script
let {newPage, end, except, extract, extractAndSave, store, runStore, waitFor} = __sandbox;
let {params, } = OPTIONS;
(async () => { try {
// -- START --
const page = await newPage(); //creates new page
await page.goto('https://www.example.org'); //opens example.org
await store.saveOne('store', {id: 1, msg: 'data'}) //saves to store
console.log(await store.getOne('store', 1)) //fetches from store
if (await page.exists('a')) { //check if anchor exists
await page.click('a'); //click anchor
await waitFor(2000); //wait 2 seconds
await page.saveSnapshot('clicked the anchor'); //save a snapshot of page
}
await page.close(); //close page
// -- END --
end()
} catch(e) { except(e) } })();
Here is a sample script that performs certain actions:
NOTE On script errors, snapshots of all valid open pages are saved. If none were to be found, it's likely that the pages never had any context in the first place.
Script Objects
page
: A page object is returned by invoking an awaited newPage() function. This object loads a webpage as well as interacts with it.- references: A external reference to a scraper defined in scrapex so as to make use of the existing tag-based APIs.
store
: An outer level store that shares its inventory with other scripts in the same Project. This store is to be used when multiple script may be extracting data from different websites but the data accessed from one store for uniformity. Stores script specific data. This is the primary store choice.runStore
: An inner level store that supposedly stores run-specific data. Although capable of storing all data, it is advised to use said store for debugging purposes as the use script has no access to stored data other than manually fetching the data by UI. It lacks the fetch APIs.- params: User parameters passed to the script that can be used to access passed values. This can be overridden in runtime to use non-default values.
General functions
end()
Terminates script.
- Return <[void]> terminates session
extractAndSave(scraper, url[, idFn])
Extracts the scrapex scraper's data from the given url and saves it to the project-level store under the name of the scraper.
- scraper <[string]> Name of scrapex scraper
- url <[string]> url to crawl
- idFn <[Function]> an ID generating function that creates the data ID
- Return <[Promise]> A promise that resolves to saving the extracted data
newPage()
Returns a promise returning the page object. This spawns a new page object. The page will be a chromium tab instance.
- Return <[Promise]<[Page]>> returns a Promise with a page object
waitFor(timeout)
Returns a promise waiting until timeout ms has passed. (Not page.waitFor)
- timeout <[number]> the amount of time in milliseconds to work
- Return <[Promise]> Promise that resolves after timeout milliseconds
class: Page
const page = await newPage();
await page.goto('https://example.com');
await page.saveSnapShot('example-page');
A page instance can be spawned in by using await newPage()
call. Page provides methods to interact with a single tab in Chromium. You can spawn a maximum of two pages parallely in one user script.
This example creates a page, navigates it to a URL, and then saves a screenshot:
page.click(selector[, options])
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.click(selector, clickOptions),
]);
Click an element on the page specified by its CSS selector.
selector
<[string]> Aselector
to search for element to click. If there are multiple elements satisfying the selector, the method throws.options
<[Object]>button
<"left"|"right"|"middle"> Defaults toleft
.clickCount
<[number]> defaults to 1.delay
<[number]> Time to wait betweenmousedown
andmouseup
in milliseconds. Defaults to 0.
- returns: <[Promise]> Promise which resolves when the element matching
selector
is successfully clicked. The Promise will be rejected if there is no element matchingselector
.
This method fetches an element with selector
, scrolls it into view if needed, and then uses page.mouse to click in the center of the element.
If there's no element matching selector
, the method throws an error.
Bear in mind that if click()
triggers a navigation event and there's a separate page.waitForNavigation()
promise to be resolved, you may end up with a race condition that yields unexpected results. The correct pattern for click and wait for navigation is the following:
NOTE This race condition is handled by the
page.clickAndWait
API.
page.clickAndWait(selector[, options])
Click an element on the page specified by its CSS selector.
selector
<[string]> Aselector
to search for element to click. If there are multiple elements satisfying the selector, the method throws.options
<[Object]> Navigation parameters which might have the following properties:timeout
<[number]> Maximum navigation time in milliseconds, defaults to 30 seconds, pass0
to disable timeout.waitUntil
<"load"|"domcontentloaded"|"networkidle0"|"networkidle2"|Array> When to consider navigation succeeded, defaults to load
. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:load
- consider navigation to be finished when theload
event is fired.domcontentloaded
- consider navigation to be finished when theDOMContentLoaded
event is fired.networkidle0
- consider navigation to be finished when there are no more than 0 network connections for at least500
ms.networkidle2
- consider navigation to be finished when there are no more than 2 network connections for at least500
ms.
- returns: <[Promise]> Promise chain which resolves when the element matching
selector
is successfully clicked and then navigation is waited on. The Promise will be rejected if there is no element matchingselector
.
This method fetches an element with selector
, scrolls it into view if needed, and then uses page.mouse to click in the center of the element.
If there's no element matching selector
, the method throws an error.
page.clickTag(scraper, tag[, options])
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.clickTag(scraper, tag, clickOptions),
]);
Click an element on the page specified by its CSS selector and wait for Navigation to finish.
scraper
<[string]]> Ascraper reference name
for the scrapex scraper configured using the references panel.tag
<[string]> Atag
to search for element to click in thescraper
. If there are multiple elements satisfying the selector, the method throws.options
<[Object]>button
<"left"|"right"|"middle"> Defaults toleft
.clickCount
<[number]> defaults to 1.delay
<[number]> Time to wait betweenmousedown
andmouseup
in milliseconds. Defaults to 0.
- returns: <[Promise]> Promise which resolves when the element matching
selector
is successfully clicked. The Promise will be rejected if there is no element matchingselector
.
This method fetches an element with selector
, scrolls it into view if needed, and then uses page.mouse to click in the center of the element.
If there's no element matching selector
, the method throws an error.
Bear in mind that if click()
triggers a navigation event and there's a separate page.waitForNavigation()
promise to be resolved, you may end up with a race condition that yields unexpected results. The correct pattern for click and wait for navigation is the following:
NOTE This race condition is handled by the
page.clickTagAndWait
API.
page.clickTagAndWait(scraper, tag[, options])
Click an element on the page specified by its CSS selector and wait for Navigation to finish.
scraper
<[string]]> Ascraper reference name
for the scrapex scraper configured using the references panel.tag
<[string]> Atag
to search for element to click in thescraper
. If there are multiple elements satisfying the selector, the method throws.options
<[Object]> Navigation parameters which might have the following properties:timeout
<[number]> Maximum navigation time in milliseconds, defaults to 30 seconds, pass0
to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) or page.setDefaultTimeout(timeout) methods.waitUntil
<"load"|"domcontentloaded"|"networkidle0"|"networkidle2"|Array> When to consider navigation succeeded, defaults to load
. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:load
- consider navigation to be finished when theload
event is fired.domcontentloaded
- consider navigation to be finished when theDOMContentLoaded
event is fired.networkidle0
- consider navigation to be finished when there are no more than 0 network connections for at least500
ms.networkidle2
- consider navigation to be finished when there are no more than 2 network connections for at least500
ms.
- returns: <[Promise]> Promise chain which resolves when the element matching
selector
is successfully clicked and then navigation is waited on. The Promise will be rejected if there is no element matchingselector
.
This method fetches an element with selector
, scrolls it into view if needed, and then uses page.mouse to click in the center of the element.
If there's no element matching selector
, the method throws an error.
page.close([options])
Closes the specified tab.
options
<[Object]>runBeforeUnload
<[boolean]> Defaults tofalse
. Whether to run the before unload page handlers.
- returns: <[Promise]>
By default, page.close()
does not run beforeunload handlers.
NOTE if
runBeforeUnload
is passed as true, abeforeunload
dialog might be summoned
page.exists(selector)
Checks if the specified selector exists on the page.
selector
<[string]> Aselector
to search for element to click. If there are multiple elements satisfying the selector, the method resolves if even one exists.- returns: <[Promise]> Promise that resolve to either true or false depending upon the selectors existence on the page.
page.extract(scraper)
Extracts the content of the page.
name
<[string]> Name of the scraper reference- returns: <[Promise]<[void]>> extracted text content of the value of the scraper.
page.goto(url[, options])
Go to the specified URL in the page that was spawned. options are wait and other nav options
url
<[string]> URL to navigate page to. The URL should include scheme, e.g.https://
.options
<[Object]> Navigation parameters which might have the following properties:timeout
<[number]> Maximum navigation time in milliseconds, defaults to 30 seconds, pass0
to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) or page.setDefaultTimeout(timeout) methods.waitUntil
<"load"|"domcontentloaded"|"networkidle0"|"networkidle2"|Array> When to consider navigation succeeded, defaults to load
. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:load
- consider navigation to be finished when theload
event is fired.domcontentloaded
- consider navigation to be finished when theDOMContentLoaded
event is fired.networkidle0
- consider navigation to be finished when there are no more than 0 network connections for at least500
ms.networkidle2
- consider navigation to be finished when there are no more than 2 network connections for at least500
ms.referer
<[string]> Referer header value. If provided it will take preference over the referer header value set by page.setExtraHTTPHeaders().
- returns: <[Promise]<?[HTTPResponse]>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect.
page.goto
will throw an error if:
- there's an SSL error (e.g. in case of self-signed certificates).
- target URL is invalid.
- the
timeout
is exceeded during navigation. - the remote server does not respond or is unreachable.
- the main resource failed to load.
page.goto
will not throw an error when any valid HTTP status code is returned by the remote server, including 404 "Not Found" and 500 "Internal Server Error". The status code for such responses can be retrieved by calling response.status().
NOTE
page.goto
either throws an error or returns a main resource response. The only exceptions are navigation toabout:blank
or navigation to the same URL with a different hash, which would succeed and returnnull
. page has JavaScript enabled,false
otherwise.
page.keyboard
Invokes the keyboard object in page.
- returns: <Keyboard>
page.mouse
Invokes the mouse object in page.
- returns: <Mouse>
page.reload([options])
Reload the page. Options are waitOptions.
options
<[Object]> Navigation parameters which might have the following properties:timeout
<[number]> Maximum navigation time in milliseconds, defaults to 30 seconds, pass0
to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) or page.setDefaultTimeout(timeout) methods.waitUntil
<"load"|"domcontentloaded"|"networkidle0"|"networkidle2"|Array> When to consider navigation succeeded, defaults to load
. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:load
- consider navigation to be finished when theload
event is fired.domcontentloaded
- consider navigation to be finished when theDOMContentLoaded
event is fired.networkidle0
- consider navigation to be finished when there are no more than 0 network connections for at least500
ms.networkidle2
- consider navigation to be finished when there are no more than 2 network connections for at least500
ms.
- returns: <[Promise]<[HTTPResponse]>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect.
page.saveSnapshot(name)
Saves the snapshot of the current page.
name
<[string]> Name of the snapshot- returns: <[Promise]<[void]>> Promise which resolves to saving snapshot.
page.scrape(scraper)
Extracts the content of the page.
name
<[string]> Name of the scraper reference- returns: <[Promise]<[void]>> extracted text content of the value of the scraper.
NOTE This API is deprecated. Please check out the extract API.
page.scrapeSelector(selector[, options])
Extracts the content of the page by passing a barebone scraper.
selector
<[string]> selector to scrape on the pageoptions
<[Object]>:required
<[boolean]> Whether it is a required tagtype
<[string]> What type of tag it is. eg: standard,etcsource_type
<[string]> Type of tag. dom/meta/builtindata_type
<[string]> What is the data type extracted. text,number, etcextractor
<[Object]>:type
<[string]>: type/prop/attrparams
<[Object]>:name
<[string]>: href/etc
modifers
<[Array]<[Object]>>:-
type
<[string]> regex/etc param
<[string]>: exp
- returns: <[Promise]<[void]>> extracted text content of the value of the scraper.
page.select(selector, ...values)
page.select('select#colors', 'blue'); // single selection
page.select('select#colors', 'red', 'green', 'blue'); // multiple selections
Select a checkbox on the page.
selector
<[string]> A [selector] to query page for...values
<...[string]> Values of options to select. If the<select>
has themultiple
attribute, all values are considered, otherwise only the first one is taken into account.- returns: <[Promise]<[Array]<[string]>>> An array of option values that have been successfully selected.
Triggers a change
and input
event once all the provided options have been selected.
If there's no <select>
element matching selector
, the method throws an error.
page.tagExists(scraper, tag)
Checks if the specified selector exists on the page.
scraper
<[string]]> Ascraper reference name
for the scrapex scraper configured using the references panel.tag
<[string]> Atag
to search for element to click in thescraper
. If there are multiple elements satisfying the selector, the method resolves if even one exists.- returns: <[Promise]> Promise that resolve to either true or false depending upon the selectors existence on the page.
page.waitFor(selectorOrFunctionOrTimeout[, options[, ...args]])
Explicit time wait on the page.
selectorOrFunctionOrTimeout
<[string]|[number]|[function]> A [selector], predicate or timeout to wait foroptions
<[Object]> Optional waiting parametersvisible
<[boolean]> wait for element to be present in DOM and to be visible. Defaults tofalse
.timeout
<[number]> maximum time to wait for in milliseconds. Defaults to30000
(30 seconds). Pass0
to disable timeout.hidden
<[boolean]> wait for element to not be found in the DOM or to be hidden. Defaults tofalse
.polling
<[string]|[number]> An interval at which thepageFunction
is executed, defaults toraf
. Ifpolling
is a number, then it is treated as an interval in milliseconds at which the function would be executed. Ifpolling
is a string, then it can be one of the following values:raf
- to constantly executepageFunction
inrequestAnimationFrame
callback. This is the tightest polling mode which is suitable to observe styling changes.mutation
- to executepageFunction
on every DOM mutation.
...args
<...[Serializable]|[JSHandle]> Arguments to pass topageFunction
- returns: <[Promise]<[JSHandle]>> Promise which resolves to a JSHandle of the success value
This method is deprecated. You should use the more explicit API methods available:
page.waitForSelector
page.waitForTag
NOTE This method behaves differently with respect to the type of the first parameter.
page.waitForNavigation([options])
const [response] = await Promise.all([
page.waitForNavigation(), // The promise resolves after navigation has finished
page.click('a.my-link'), // Clicking the link will indirectly cause a navigation
]);
Wait for page Navigation to finish.
options
<[Object]> Navigation parameters which might have the following properties:timeout
<[number]> Maximum navigation time in milliseconds, defaults to 30 seconds, pass0
to disable timeout.waitUntil
<"load"|"domcontentloaded"|"networkidle0"|"networkidle2"> When to consider navigation succeeded, defaults toload
. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:load
- consider navigation to be finished when theload
event is fired.domcontentloaded
- consider navigation to be finished when theDOMContentLoaded
event is fired.networkidle0
- consider navigation to be finished when there are no more than 0 network connections for at least500
ms.networkidle2
- consider navigation to be finished when there are no more than 2 network connections for at least500
ms.
- returns: <[Promise]<?[HTTPResponse]>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect. In case of navigation to a different anchor or navigation due to History API usage, the navigation will resolve with
null
.
This resolves when the page navigates to a new URL or reloads. It is useful when you run code that will indirectly cause the page to navigate. Consider this example:
page.waitForSelector(selector[, options])
Wait for selector to be available on the page.
selector
<[string]> A [selector] of an element to wait for. If there are multiple elements satisfying the selector, the method waits for all to finish.options
<[Object]> Optional waiting parametersvisible
<[boolean]> wait for element to be present in DOM and to be visible, i.e. to not havedisplay: none
orvisibility: hidden
CSS properties. Defaults tofalse
.hidden
<[boolean]> wait for element to not be found in the DOM or to be hidden, i.e. havedisplay: none
orvisibility: hidden
CSS properties. Defaults tofalse
.timeout
<[number]> maximum time to wait for in milliseconds. Defaults to30000
(30 seconds). Pass0
to disable timeout.
- returns: <[Promise]<?[ElementHandle]>> Promise which resolves when element specified by selector string is added to DOM. Resolves to
null
if waiting forhidden: true
and selector is not found in DOM.
Wait for the selector
to appear in page. If at the moment of calling
the method the selector
already exists, the method will return
immediately. If the selector doesn't appear after the timeout
milliseconds of waiting, the function will throw.
NOTE Usage of the History API to change the URL is considered a navigation.
page.waitForTag(scraper, tag[, options])
Wait for selector to be available on the page, defined by a scrapex scraper.
scraper
<[string]]> Ascraper reference name
for the scrapex scraper configured using the references panel.tag
<[string]> Atag
to search for element to click in thescraper
. If there are multiple elements satisfying the selector, the method waits for all to finish.options
<[Object]> Optional waiting parametersvisible
<[boolean]> wait for element to be present in DOM and to be visible, i.e. to not havedisplay: none
orvisibility: hidden
CSS properties. Defaults tofalse
.hidden
<[boolean]> wait for element to not be found in the DOM or to be hidden, i.e. havedisplay: none
orvisibility: hidden
CSS properties. Defaults tofalse
.timeout
<[number]> maximum time to wait for in milliseconds. Defaults to30000
(30 seconds). Pass0
to disable timeout.
- returns: <[Promise]<?[ElementHandle]>> Promise which resolves when element specified by selector string is added to DOM. Resolves to
null
if waiting forhidden: true
and selector is not found in DOM.
Wait for the selector
to appear in page. If at the moment of calling
the method the selector
already exists, the method will return
immediately. If the selector doesn't appear after the timeout
milliseconds of waiting, the function will throw.
NOTE Usage of the History API to change the URL is considered a navigation.
class: Keyboard
await page.keyboard.type('Hello World!');
await page.keyboard.press('ArrowLeft');
await page.keyboard.down('Shift');
for (let i = 0; i < ' World'.length; i++)
await page.keyboard.press('ArrowLeft');
await page.keyboard.up('Shift');
await page.keyboard.press('Backspace');
// Result text will end up saying 'Hello!'
await page.keyboard.down('Shift');
await page.keyboard.press('KeyA');
await page.keyboard.up('Shift');
// An example of pressing `A`
Keyboard provides an API for managing a virtual keyboard. The high-level API is keyboard.type
, which takes raw characters and generates proper keydown, keypress/input, and keyup events on your page.
For finer control, you can use keyboard.down
, keyboard.up
, and keyboard.sendCharacter
to manually fire events as if they were generated from a real keyboard.
An example of holding down Shift
in order to select and delete some text:
NOTE On macOS, keyboard shortcuts like
⌘ A
-> Select All does not work. See #1313
keyboard.down(key[, options])
key
<[string]> Name of key to press, such asArrowLeft
. See [USKeyboardLayout] for a list of all key names.options
<[Object]>text
<[string]> If specified, generates an input event with this text.
- returns: <[Promise]>
Dispatches a keydown
event.
If key
is a single character and no modifier keys besides Shift
are being held down, a keypress
/input
event will also be generated. The text
option can be specified to force an input event to be generated.
If key
is a modifier key, Shift
, Meta
, Control
, or Alt
, subsequent key presses will be sent with that modifier active. To release the modifier key, use keyboard.up
.
After the key is pressed once, subsequent calls to keyboard.down
will have repeat set to true. To release the key, use keyboard.up
.
NOTE Modifier keys DO influence
keyboard.down
. Holding downShift
will type the text in upper case.
keyboard.press(key[, options])
key
<[string]> Name of key to press, such asArrowLeft
. See [USKeyboardLayout] for a list of all key names.options
<[Object]>text
<[string]> If specified, generates an input event with this text.delay
<[number]> Time to wait betweenkeydown
andkeyup
in milliseconds. Defaults to 0.
- returns: <[Promise]>
If key
is a single character and no modifier keys besides Shift
are being held down, a keypress
/input
event will also be generated. The text
option can be specified to force an input event to be generated.
NOTE Modifier keys DO affect
keyboard.press
. Holding downShift
will type the text in upper case.
Shortcut for keyboard.down
and keyboard.up
.
keyboard.sendCharacter(char)
page.keyboard.sendCharacter('嗨');
char
<[string]> Character to send into the page.- returns: <[Promise]>
Dispatches a keypress
and input
event. This does not send a keydown
or keyup
event.
NOTE Modifier keys DO NOT affect
keyboard.sendCharacter
. Holding downShift
will not type the text in upper case.
keyboard.type(text[, options])
await page.keyboard.type('Hello'); // Types instantly
await page.keyboard.type('World', { delay: 100 }); // Types slower, like a user
text
<[string]> A text to type into a focused element.options
<[Object]>delay
<[number]> Time to wait between key presses in milliseconds. Defaults to 0.
- returns: <[Promise]>
Sends a keydown
, keypress
/input
, and keyup
event for each character in the text.
To press a special key, like Control
or ArrowDown
, use keyboard.press
.
NOTE Modifier keys DO NOT affect
keyboard.type
. Holding downShift
will not type the text in upper case.
keyboard.up(key)
key
<[string]> Name of key to release, such asArrowLeft
. See [USKeyboardLayout] for a list of all key names.- returns: <[Promise]>
Dispatches a keyup
event.
class: Mouse
// Using ‘page.mouse’ to trace a 100x100 square.
await page.mouse.move(0, 0);
await page.mouse.down();
await page.mouse.move(0, 100);
await page.mouse.move(100, 100);
await page.mouse.move(100, 0);
await page.mouse.move(0, 0);
await page.mouse.up();
The Mouse class operates in main-frame CSS pixels relative to the top-left corner of the viewport.
Every page
object has its own Mouse, accessible with page.mouse
.
Note that the mouse events trigger synthetic MouseEvent
s.
This means that it does not fully replicate the functionality of what a normal user would be able to do with their mouse.
mouse.click(x, y[, options])
x
<[number]>y
<[number]>options
<[Object]>button
<"left"|"right"|"middle"> Defaults toleft
.clickCount
<[number]> defaults to 1. See [UIEvent.detail].delay
<[number]> Time to wait betweenmousedown
andmouseup
in milliseconds. Defaults to 0.
- returns: <[Promise]>
Shortcut for mouse.move
, mouse.down
and mouse.up
.
mouse.down([options])
options
<[Object]>button
<"left"|"right"|"middle"> Defaults toleft
.clickCount
<[number]> defaults to 1. See [UIEvent.detail].
- returns: <[Promise]>
Dispatches a mousedown
event.
mouse.move(x, y[, options])
x
<[number]>y
<[number]>options
<[Object]>steps
<[number]> defaults to 1. Sends intermediatemousemove
events.
- returns: <[Promise]>
Dispatches a mousemove
event.
mouse.up([options])
options
<[Object]>button
<"left"|"right"|"middle"> Defaults toleft
.clickCount
<[number]> defaults to 1. See [UIEvent.detail].
- returns: <[Promise]>
Dispatches a mouseup
event.
mouse.wheel([options])
await page.goto(
'https://mdn.mozillademos.org/en-US/docs/Web/API/Element/wheel_event$samples/Scaling_an_element_via_the_wheel?revision=1587366'
);
await page.mouse.wheel({ deltaY: -100 });
options
<[Object]>deltaX
X delta in CSS pixels for mouse wheel event (default: 0). Positive values emulate a scroll right and negative values a scroll left event.deltaY
Y delta in CSS pixels for mouse wheel event (default: 0). Positive values emulate a scroll down and negative values a scroll up event.
- returns: <[Promise]>
Dispatches a mousewheel
event.
class: Store
await store.saveOne('random-store', {id: 'some', data: 'other'});
const data = await store.getOne('random-store', 'some');
Scrapex stores data in collections and serves it to the user through the script or API as required. Store is an outer level store that shares its inventory with other scripts in the same Project. This store is to be used when multiple script may be extracting data from different websites but the data accessed from one store for uniformity.
Script store is the default store and the snippet below is a sample store interaction:
store.getOne(collection, id)
Retrieves a single record from the store.
- collection <[string]> Name of the collection in the store that has the data
- id <[string]> id of the data item that the user wishes to fetch
- return <[Promise]<[Object]>> Promise that resolves to the data item that was fetched as per user request
store.getAll(collection [, options])
await store.getAll('store', {
limit: 100,
offset: 500,
only: ['id']
})
Retrieves multiple records from the store.
- collection <[string]> Name of the collection in the store that has the data
- options <[Object]> :
- limit <[string]> Limit of the amount of records fetched. Default is 50
- offset <[string]> Offset of data origin. Default is 0
- only <[Array]<[string]>> only fetch the following columns from the DB and not entire records.
- return <[Promise]<[Array]<[Object]>>> Promise that resolves to all data items that was fetched as per user request
store.getIds(collection)
await store.getIds('store');
Retrieves all ids from the store as a list.
- collection <[string]> Name of the collection in the store that has the data
store.saveOne(collection, data[, metadata, idFn])
await store.saveOne('store', {data: "val"}, {metadata: 'json'}, () => {return 1;})
Saves one data item into the store.
- collection <[string]> Name of the collection in the store that has the data
- data <[Object]>:
- id <[string]> the unique identification keyword for the datum. It is required to store the data. Method throws if not passed.
- [key] <[string]> data to be stored.
- metadata <[Object]> the metadata of the object being stored
- idFn <[function]> An id generating function if required
- return <[Promise]> Promise that resolves to the data item being saved
store.saveMany(collection, records)
Saves several data items into the store.
- collection <[string]> Name of the collection in the store that has the data
- records <[Array]<[Object]>>: each record will be a data item with the following fields
- id <[string]> the unique identification keyword for the datum. It is required to store the data. Method throws if not passed.
- [key] <[string]> data to be stored.
- idFn <[function]> An id generating function if required
- return <[Promise]> Promise that resolves to the data items being saved
class: runStore
await runstore.saveOne('random-store', {id: 'some', data: 'other'});
await runstore.saveMany('random-store', [{id: 'some2', data: 'other'}]);
Scrapex stores data in collections and serves it to the user through the script or API as required. runStore is an inner level store that supposedly stores run-specific data. Although capable of storing all data, it is advised to use said store for debugging purposes as the use script has no access to stored data other than manually fetching the data by UI. It lacks the fetch APIs..
Run store lacks the get APIs but retains both save APIs the other stores have. The snippet below is a sample store interaction:
runStore.saveOne(collection, data[, metadata, idFn])
await runstore.saveOne('store', {data: "val"}, {metadata: 'json'}, () => {return 1;})
Saves one data item into the run store.
- collection <[string]> Name of the collection in the store that has the data
- data <[Object]>:
- id <[string]> the unique identification keyword for the datum. It is required to store the data. Method throws if not passed.
- [key] <[string]> data to be stored.
- metadata <[Object]> the metadata of the object being stored
- idFn <[function]> An id generating function if required
- return <[Promise]> Promise that resolves to the data item being saved
runStore.saveMany(collection, records)
Saves several data items into the run store.
- collection <[string]> Name of the collection in the store that has the data
- records <[Array]<[Object]>>: each record will be a data item with the following fields
- id <[string]> the unique identification keyword for the datum. It is required to store the data. Method throws if not passed.
- [key] <[string]> data to be stored.
- idFn <[function]> An id generating function if required
- return <[Promise]> Promise that resolves to the data items being saved