Today's Mission: Scrape date from a JavaScript powered table.

In a world full of people using modern solutions such as Selenium, the easiest solution may just be the oldest… using your brain + cURL.

I saw this contract tonight on UpWork.

UpWork freelance contract posting for creating an email list of migration agents in South Australia

Title: “Create email list of migration agents in South Australia”

Description: “We are looking for a skilled freelancer who can help us in creating an email list of migration agents in South Australia. The ideal candidate should have expertise in data entry, data scraping, lead generation, and other relevant skills.

Your main task will be to collect and collate data of migration agents from https://portal.mara.gov.au/search-the-register-of-migration-agents/ - To make searching easier, put SA in the Business Location search field. Do not include any agents outside of South Australia (SA).

The finished product should be delivered in Excel format with First Name, Surname, Company Name and Email Address. Completion required within 48 hours.”


Upon loading the Department of Home Affairs site, we’re greeted with a table and a bunch of filters.

Department of Home Affairs migration agents register website showing search interface with filters and data table

Censored since I’m not doing free advertising

Doing a search with SA in the filter gives us a table with 17 loaded record and 18 pages. So about 306 records total.

Search results showing 17 migration agents loaded on page 1 of 18 total pages for South Australia

We can see that the table is dynamically loading data because of the load between hitting page numbers and the page numbers themselves being links to functions (on hover), meaning you can’t tab through the pages:

Browser page showing pagination controls with JavaScript function links instead of direct URL links
Close-up of pagination interface showing page numbers as clickable function calls rather than standard web links

The other issue is that the contract asks that we grab the Email Address as well, which isn’t plainly displayed. You can only get the email address by hitting each of the surname/family name links in the left-most column.

Going into the network tab, we can take a look at what requests are being sent back and forth when hitting buttons on the page

Browser developer tools network tab showing HTTP requests made when clicking pagination buttons

The request itself contains the page we’re loading, and the amount of data to be loaded per page (PageSize: 17 and 17 rows from the previous observation. Bingo).

Using Firefox, I’ll take a copy of the command that was sent in Powershell:

Firefox developer tools showing Copy as PowerShell option for the HTTP request
PowerShell command copied from Firefox showing the complete curl command with headers and parameters

That’s a big ass command

Most of this is the raw data for the command, so lets take a look at the end and change the variables that we want, which are at the very end. Changing it to page 1, and that we want 500 results.

PowerShell command being modified to change page number to 1 and result count to 500

Ba da ba boom…

PowerShell terminal showing successful execution of the modified API request command

Pow!

Pop open terminal and wrap the command as a variable and let’s ask it to print the content. Like:

$(command).Content
PowerShell terminal showing JSON response data from the API request containing migration agent information

Rad, now we’re getting a big response, in JSON form.

Close-up view of the JSON response showing structured data with names, addresses, and contact details

Let’s pipe that into a JSON file.

PowerShell command piping the JSON response to a file, showing successful data extraction with all required fields

We have first name, last name, address, business name, phone numbers, and email addresses, and way more.

I won’t go into much more, but I know using jq and some scripting, I could deliver this as a CSV, or even better, format an excel/sheets file to take this raw data and fill other sheets with proper formating. I could also change the filtering so that we not only grab SA, but every single agency on record.


I didn’t end up getting hired for this job since, as is typical of Upwork and gig sites, even a $20 gig can be met with $2 offers from people in 3rd world countries. That’s a problem for another time.