Refining a selection in configurator

The refining action is best illustrated using an example.

Goal:

To scrape the course links from classcentral

Tags

Tags DataType Extractor
Links Text Array Prop

Sample Data

Tag Name Data
Links [‘https://www.classcentral.com/course/machine-learning-835’, ‘https://www.classcentral.com/course/information-systems-audit-17979’, …]

Step 1: Create a project

  1. Head over to https://app.scrapex.ai/ and click on the New Project button
  2. Enter a project name and enter https://www.classcentral.com/subject/cs as the URL
  3. Leave proxy settings as such and click on the Create button.

Refining a selection

Step 2: Configure the scraper

  1. Click on the scraper titled Default Scraper to open up the Configurator

Refining a selection

  1. Configurator would open up with the classcentral website loaded in the cloud browser. Panel on the left consists of 5 tools; namely Single Select, Multi Select, Group tool, Builtin and Meta.

Refining a selection

  1. As we are looking to collect all the course links, Multi Select would be the right tool for this job.

    • Click on the Multi Select (2rd tool in the left toolbar)
  2. Click on the first course title called Machine learning. We can disregard the post above as it is an ad. As soon as you click on the post title, all similar course titles are also selected intelligently.

Refining a selection

  1. Along with course titles, extraneous elements were also selected. This is because the algorithm tries to predict the most general css selector based on the inputs by the user. Let us provide more inputs to refine the prediction.

Refining a selection

  1. Click X on the most irrelevant element. In the above example, we have chosen to click X on the entire post as we’re only interested in the title. Immediately, we see the number of highlighted elements decrease.

Refining a selection

  • Once more, as the above screenshot illustrates, click X to remove the course properties. We can see that all the course titles are extracted in the preview. However the goal is to extract links rather than text titles.

  • Links are present as an href property of an anchor tag. Click on the dropdown icon as illustrated above to open up the refine menu.

  • Inside the popup, we can see that currently an <h2> element is selected. Click on <a> tag which is present above h2.

Refining a selection

-You have succesfully refined the selection and all anchor tags are selected

Refining a selection

  • Open the options panel and change the extractor to prop and set the name to href

Refining a selection

  • We have successfully extracted all the course links, click on submit.

Refining a selection