Tag Archives: node.js

Gathering Net Salary Data with Puppeteer

Tax is one of those things that makes moving to a different country difficult, because it varies wildly between countries. How much do you need to earn in that country to maintain the same standard of living?

You can, of course, use an online salary calculator to understand how much net salary you’re left with after deducting tax and social security contributions, but this only lets you sample specific salaries and doesn’t really give you enough information to assess how the impact of tax changes as you earn more. Importantly, you can’t use these tools to draw a graph for each country and compare.

Malta Salary Calculator by Darren Scerri

Fortunately, however, these tools have already done the heavy lifting by taking care of the complex calculations. To build a graph, all we really need to do is to take samples at regular intervals, say, every 1,000 Euros. Since that is very tedious to do by hand, we’ll use a browser automation tool to do this for us.

Enter Puppeteer

Puppeteer, as the homepage says, “is a Node library which provides a high-level API to control Chrome or Chromium”, which is pretty much what we need for this job. It also gives us what we need to get started. In a new folder, run the following to install the puppeteer dependency:

npm i puppeteer

Then, create a new file (e.g. netsalary.js) and add the starter code from the Puppeteer homepage. We’ll use this as a starting point:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

Getting Salary Data for Malta

In this particular exercise, we’ll get the salary data for Malta using Darren Scerri’s Malta Salary Calculator, which is relatively easy to work with.

Before we write any code, we need to understand the dynamics of the calculator. We do this via the browser’s developer tools.

Whenever you change the value of the gross salary input field (that has the “salary” id in the HTML), a bunch of numbers get updated, including the yearly net salary (which has the “net-yearly-result” class) which is what we’re interested in.

Just by knowing how we can reach the relevant elements, we can write our first code to retrieve the input (gross salary) and output (yearly net salary) values to make sure we know what we’re doing:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  // Gross salary
  const grossSalaryInput = await page.$("#salary");
  const grossSalary = await page.evaluate(element => element.value, grossSalaryInput);
  console.log('Gross salary: ', grossSalary);
  
  // Net salary
  const netSalaryElement = await page.$('.net-yearly-result');
  const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);
  console.log('Net salary: ', netSalary);

  await browser.close();
})(); 

Here, we’re using the page.$() function to locate an element the same way we would using jQuery. Then we use the page.evaluate() function to get something from that element (in this case, the value of the input field). We do the same for the net salary, with the notable difference that in the page.evaluate() function, we get the textContent property of the element instead.

If we run this (node netsalary.js), we should get the same default values we see in the online salary calculator:

We managed to retrieve the gross and net salaries from the online calculator.

Text Entry

That was easy enough, but it used the default values that are present when the page is loaded. How do we manipulate the input field so that we can enter arbitrary gross salary values and later pick up the computed net salary?

The simplest way to do this is by simulating keyboard input as follows:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  const grossSalary = 30000;
  
  // Gross salary - keyboard input
  await page.focus("#salary");
  
  for (var i = 0; i < 6; i++)
    await page.keyboard.press('Backspace');
  
  await page.keyboard.type(grossSalary.toString());
  
  // Net salary
  const netSalaryElement = await page.$('.net-yearly-result');
  const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);
  console.log('Net salary: ', netSalary);

  await browser.close();
})(); 

Here, we:

  1. Focus the input field, so that whatever we type goes in there.
  2. Press backspace six times to erase any existing gross salary in the field (if you check the online calculator, you’ll see it can take up to six digits).
  3. Type in the string version of our gross salary, which is a hardcoded constant with a value of 30,000.

The result I get when I run this matches what the online calculator gives me. I guess I must be doing something right for once in my life.

Net salary:  22,805.44

Pulling Net Salary Data in a Range

So now we know how to enter a gross salary and read out the corresponding net salary. How do we do this at regular intervals within a range (e.g. every 1,000 Euros between 15,000 and 140,000)? Easy. We write a loop.

In practice, there’s a little timing issue between iterations, so I also needed to nick a very handy sleep function off Stack Overflow and put a very short delay after doing the keyboard input, to give it time to update the output values.

const puppeteer = require('puppeteer');

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  console.log('Gross Net');
  
  for (var grossSalary = 15000; grossSalary <= 140000; grossSalary += 1000) {
    // Gross salary - keyboard input
    await page.focus("#salary");
  
    for (var i = 0; i < 6; i++)
      await page.keyboard.press('Backspace');
  
    await page.keyboard.type(grossSalary.toString());
    await sleep(10);
  
    // Net salary
    const netSalaryElement = await page.$('.net-yearly-result');
    const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);

    console.log(grossSalary, netSalary);
  }

  await browser.close();
})(); 

This has the effect of outputting a pair of headings (“Gross Net”) followed by gross and net salary pairs:

Outputting the gross and net salaries in steps of 1,000 Euros (gross) at a time.

Making a Graph

Now that we have a program that spits out pairs of gross and net salaries, we can make a graph out of this data. First, we dump all this into a file.

node netsalary.js > malta.csv

Although this is technically not really CSV data, it’s still very easy to open in spreadsheet software. For instance, when you open this file using LibreOffice Calc, you get the Text Import screen where you can choose to use space as the separator. This makes things easier given that the net salaries contain commas.

Choose Space as the separator to load the data correctly.

Once the data is in a spreadsheet, producing a chart is a relatively simple matter:

Graph showing how net salary changes with gross salary in Malta.

Now, this graph might look a little lonely, but you can already gather interesting insight by noticing its gradient and the fact that it isn’t entirely straight.

After doing this exercise for multiple countries, it’s fascinating to see how their lines compare when plotted on the same chart.

Aside from the allure of data analysis, I hope this article served to show how easy it is to use Puppeteer to perform simple browser automation, beyond the obvious UI automation testing.

A Gentle Introduction to Gulp

We’re at the end of 2015, and web technology has changed quite  a bit since I started in 2002. Nowadays, for the front end stuff, there is a whole family of tools based on the node.js package manager (npm) that you can use to streamline and automate your workflow.

In this article (based on Windows), we’ll learn to use Gulp to do routine tasks such as concatenating and minifying JavaScript tasks. There’s another tool called Grunt with a similar purpose, and you’ll find all sorts of discussions on the internet comparing Grunt vs Gulp. Basically, Grunt is the older of the two and has a bigger community – an important factor considering that these tools are plugin-driven. However, I’m covering Gulp here as I felt it was more intuitive. For this small demonstration it has all the plugins we need, and performance (a common point of comparison) isn’t even a factor.

Setting up Gulp

The first thing we need is to install node.js:

install-nodejs

There’s a chance you might already have node.js, if you installed it with Visual Studio 2015.

Once you have installed node.js, you should have npm in your path. Open a command prompt, and install Gulp using the following command:

npm install gulp -g

-g means globally, and thanks to this, gulp should now be in your path.

Next, we want to create a package.json file. This is a kind of project file for node.js-related stuff. We can use npm for this too:

npm init

npm will ask a bunch of questions in order to set up the package.json file, suggesting possible answers where it makes sense to do so. name and version are required, but you can leave the rest empty if you like:

npm-init

Next, we need to install Gulp locally in our project:

npm install gulp --save-dev

This installs Gulp; –save-dev updates the package.json with a devDependencies field:

{
  "name": "gulptest",
  "version": "1.0.0",
  "description": "Learning to use Gulp.",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Daniel D'Agostino",
  "license": "ISC",
  "devDependencies": {
    "gulp": "^3.9.0"
  }
}

Plugins and the Gulp file

Gulp itself doesn’t do anything; it is just configured to run tasks. Its capabilities come from the plugins you install, and you configure it to do stuff using a Gulp file. For this simple example, we’re just going to use a few plugins:

npm install gulp-concat gulp-uglify --save-dev

Once again, –save-dev updates your devDependencies in package.json:

  "devDependencies": {
    "gulp": "^3.9.0",
    "gulp-concat": "^2.6.0",
    "gulp-uglify": "^1.5.1"
  }

Next, create a file called gulpfile.js, and put the following code in it:

var gulp = require('gulp'),
    uglify = require('gulp-uglify'),
    concat = require('gulp-concat');
    
gulp.task('default', function() {
  return gulp.src('js/*.js')
    .pipe(concat('all.js'))
    .pipe(gulp.dest('dist/'));
});

To test this out, I downloaded jquery and jquery-ui, and put the uncompressed Javascript files in a “js” folder. Having created the Gulpfile above, all you need is to run Gulp:

gulp

You should find a folder called dist, with a file called all.js in it, containing the contents of the files originally in the js folder:

gulp-concat

Concatenating JavaScript is good for performance because the browser only needs to make a single request, rather than having to retrieve several small files. But we can do even better by minifying the JavaScript (using the gulp-uglify plugin). Just add the following line:

var gulp = require('gulp'),
    uglify = require('gulp-uglify'),
    concat = require('gulp-concat');
    
gulp.task('default', function() {
  return gulp.src('js/*.js')
    .pipe(concat('all.js'))
    .pipe(uglify())
    .pipe(gulp.dest('dist/'));
});

Run Gulp again, and you’ll find that all.js has been updated. In fact, it’s much smaller now, and it’s completely illegible:

gulp-uglify

Conclusion and Further Reading

The purpose of this article was to get you set up with Gulp, and see something working with the least possible hassle. Mark Goodyear’s article (on which this article is partly based) covers a lot of other common operations to carry out with Gulp. If you need to do anything particular – linting your JavaScript files, minifying your CSS, using Less, etc, there’s probably a plugin for it.

Beyond that, all you need to know is how to use Gulp effectively as part of your build process.

  • Running Gulp without arguments makes it look for the “default” task. You can pass the name of a task to run as an argument, allowing you to run a variety of operations.
  • How do you debug your minified JavaScript? You don’t. Use separate tasks for development and for release, and minify only in your release task.
  • Ideally these tasks should be run automatically as part of your continuous integration.
  • An ASP .NET 5 (formerly known as vNext) project in Visual Studio 2015 can easily integrate with npm tools, and you can configure it to run your tasks when you build.
  • Not using Windows? These command line tools are easy to use on other platforms (although installing npm will obviously be different).

Update 8th January 2016: Check out “More Gulp in Practice“, the followup to this article.