Scraping HackerNews using scripts

A practical example of a script exploring hackernews.

Example Script:

let {newPage, end, except, extract, extractAndSave, store, runStore, waitFor} = __sandbox;
let {params, } = OPTIONS;
(async () => { try {
	// -- START --
	const page = await newPage()
	console.log('navigating to url')
	await page.goto(params.url)
	const {authors, titles, points} = await page.extract('hn')
	console.log('scraping url')
	const storeBlob = titles.map((title, index) => {
		return {
			id: index, 
			title,
			author: authors[index],
			points: points[index]
		}
	})
	console.log('saving to store')
	await store.saveMany('hn-data', storeBlob)
	// -- END --
	end()
} catch(e) { except(e) } })();

Scrapex.ai HN Script

Downloadable Hackernews Project

Code Breakdown

The given snippet:

  • Opens a new page using the newPage() method
  • Navigate to the url provided in params as url key
  • Extracts from page the data given by the scraper marked by the alias hn. This scraper in this context extracts the authors, titles and points from HN.
  • Map through the titles
  • Generate a storeBlob that contains objects with the title, author and points
  • Store all data to project-level store with the hn-data