Gathering Net Salary Data with Puppeteer

Tax is one of those things that makes moving to a different country difficult, because it varies wildly between countries. How much do you need to earn in that country to maintain the same standard of living?

You can, of course, use an online salary calculator to understand how much net salary you’re left with after deducting tax and social security contributions, but this only lets you sample specific salaries and doesn’t really give you enough information to assess how the impact of tax changes as you earn more. Importantly, you can’t use these tools to draw a graph for each country and compare.

Malta Salary Calculator by Darren Scerri

Fortunately, however, these tools have already done the heavy lifting by taking care of the complex calculations. To build a graph, all we really need to do is to take samples at regular intervals, say, every 1,000 Euros. Since that is very tedious to do by hand, we’ll use a browser automation tool to do this for us.

Enter Puppeteer

Puppeteer, as the homepage says, “is a Node library which provides a high-level API to control Chrome or Chromium”, which is pretty much what we need for this job. It also gives us what we need to get started. In a new folder, run the following to install the puppeteer dependency:

npm i puppeteer

Then, create a new file (e.g. netsalary.js) and add the starter code from the Puppeteer homepage. We’ll use this as a starting point:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();

Getting Salary Data for Malta

In this particular exercise, we’ll get the salary data for Malta using Darren Scerri’s Malta Salary Calculator, which is relatively easy to work with.

Before we write any code, we need to understand the dynamics of the calculator. We do this via the browser’s developer tools.

Whenever you change the value of the gross salary input field (that has the “salary” id in the HTML), a bunch of numbers get updated, including the yearly net salary (which has the “net-yearly-result” class) which is what we’re interested in.

Just by knowing how we can reach the relevant elements, we can write our first code to retrieve the input (gross salary) and output (yearly net salary) values to make sure we know what we’re doing:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  // Gross salary
  const grossSalaryInput = await page.$("#salary");
  const grossSalary = await page.evaluate(element => element.value, grossSalaryInput);
  console.log('Gross salary: ', grossSalary);
  
  // Net salary
  const netSalaryElement = await page.$('.net-yearly-result');
  const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);
  console.log('Net salary: ', netSalary);

  await browser.close();
})(); 

Here, we’re using the page.$() function to locate an element the same way we would using jQuery. Then we use the page.evaluate() function to get something from that element (in this case, the value of the input field). We do the same for the net salary, with the notable difference that in the page.evaluate() function, we get the textContent property of the element instead.

If we run this (node netsalary.js), we should get the same default values we see in the online salary calculator:

We managed to retrieve the gross and net salaries from the online calculator.

Text Entry

That was easy enough, but it used the default values that are present when the page is loaded. How do we manipulate the input field so that we can enter arbitrary gross salary values and later pick up the computed net salary?

The simplest way to do this is by simulating keyboard input as follows:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  const grossSalary = 30000;
  
  // Gross salary - keyboard input
  await page.focus("#salary");
  
  for (var i = 0; i < 6; i++)
    await page.keyboard.press('Backspace');
  
  await page.keyboard.type(grossSalary.toString());
  
  // Net salary
  const netSalaryElement = await page.$('.net-yearly-result');
  const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);
  console.log('Net salary: ', netSalary);

  await browser.close();
})(); 

Here, we:

  1. Focus the input field, so that whatever we type goes in there.
  2. Press backspace six times to erase any existing gross salary in the field (if you check the online calculator, you’ll see it can take up to six digits).
  3. Type in the string version of our gross salary, which is a hardcoded constant with a value of 30,000.

The result I get when I run this matches what the online calculator gives me. I guess I must be doing something right for once in my life.

Net salary:  22,805.44

Pulling Net Salary Data in a Range

So now we know how to enter a gross salary and read out the corresponding net salary. How do we do this at regular intervals within a range (e.g. every 1,000 Euros between 15,000 and 140,000)? Easy. We write a loop.

In practice, there’s a little timing issue between iterations, so I also needed to nick a very handy sleep function off Stack Overflow and put a very short delay after doing the keyboard input, to give it time to update the output values.

const puppeteer = require('puppeteer');

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://maltasalary.com/');
  
  console.log('Gross Net');
  
  for (var grossSalary = 15000; grossSalary <= 140000; grossSalary += 1000) {
    // Gross salary - keyboard input
    await page.focus("#salary");
  
    for (var i = 0; i < 6; i++)
      await page.keyboard.press('Backspace');
  
    await page.keyboard.type(grossSalary.toString());
    await sleep(10);
  
    // Net salary
    const netSalaryElement = await page.$('.net-yearly-result');
    const netSalary = await page.evaluate(element => element.textContent, netSalaryElement);

    console.log(grossSalary, netSalary);
  }

  await browser.close();
})(); 

This has the effect of outputting a pair of headings (“Gross Net”) followed by gross and net salary pairs:

Outputting the gross and net salaries in steps of 1,000 Euros (gross) at a time.

Making a Graph

Now that we have a program that spits out pairs of gross and net salaries, we can make a graph out of this data. First, we dump all this into a file.

node netsalary.js > malta.csv

Although this is technically not really CSV data, it’s still very easy to open in spreadsheet software. For instance, when you open this file using LibreOffice Calc, you get the Text Import screen where you can choose to use space as the separator. This makes things easier given that the net salaries contain commas.

Choose Space as the separator to load the data correctly.

Once the data is in a spreadsheet, producing a chart is a relatively simple matter:

Graph showing how net salary changes with gross salary in Malta.

Now, this graph might look a little lonely, but you can already gather interesting insight by noticing its gradient and the fact that it isn’t entirely straight.

After doing this exercise for multiple countries, it’s fascinating to see how their lines compare when plotted on the same chart.

Aside from the allure of data analysis, I hope this article served to show how easy it is to use Puppeteer to perform simple browser automation, beyond the obvious UI automation testing.

Jack of All Trades, Master of None

“Jack of all trades, master of none,” I’ve heard IT industry professionals scoff arrogantly several times in the past. That was their judgement of polyglot programmers, full stack developers, or any other people who had dabbled in more than one area of the kind of work we do.

Full stack. Heh heh. I took this picture at an eatery in San Francisco in February 2017.

I get where they’re coming from. Our industry is a very complicated one, and it’s really hard to learn much of anything if you don’t focus. Whether we’re talking about backend, frontend, databases, NoSQL and so on, there is an overwhelming number of technologies to discover and learn, and the information overload on the internet doesn’t help digest them.

The conventional wisdom is to be a “jack of all trades, master of one“, meaning you learn a number of different things but try to excel at at least one of them. This is great advice, but I’ve very rarely seen it happen in practice. People tend to either specialise in one thing in depth, or have a superficial knowledge of different things. Which of these would you choose?

Being just a master of one is a real problem I’ve seen for a long time. People who have only focused on one programming paradigm throughout their training and career tend to have trouble thinking outside the box and finding different ways to solve problems. Backend and frontend skills don’t easily transfer across, and most full stack developers are typically much stronger at one of these than the other. Most developers don’t understand enough about security, infrastructure and architecture.

The way I see it, that’s pretty bad. A developer who knows nothing about servers is a bit like a centre-forward who is so good that he can dribble past the opposing team’s defence, but fails to score every time.

That’s why even a semi-decent University education in this field doesn’t just teach programming. Learning CPU architecture, systems programming, Assembly language or C gives an idea what happens under the hood and teaches a certain appreciation of resources and doing things efficiently, a perspective that people having only experience with high-level languages often dismiss. Having a basic grasp of business and management prevents developers from idolising their code (“clean code” anyone?) and helps them to focus on solving real problems. Understanding a little about infrastructure helps them better understand architectural and security concerns that code-focused developers will often ignore.

Universities, in fact, produce instances of “jack of all trades, master of none”. Let’s face it: when you graduate, although your scores might give you confidence, you don’t really know much of anything. But thanks to your holistic training, you’re able to understand a variety of different problems, and gradually get a deeper understanding of what you need to solve them.

I also think there is a place for the “master of one” type of people who specialise very strongly in one area and mostly ignore the rest. I imagine these would be academics, or R&D specialists in big companies like Google. But, as much as we like drawing inspiration and borrowing ideas from successful companies, we always have to keep the context in mind. Who are we kidding? Most of us mere mortals don’t work at companies anything like Google.

So what I’m saying here is: it’s not bad to be a “jack of all trades, master of none” in our field. It is obviously better if you’re also “master of one”, but if I had to choose, I’d be (or hire) the one with the broad but superficial knowledge. Because when you are aware that something exists, then it’s not a huge leap to research it in more detail and get to the depth you need. It’s actually a good strategy to learn depth on an as-needed basis, especially given that you’ll never have enough time to learn everything in depth.

As in life, I’d rather know a little bit about many things that interest me, than bury myself in one thing and otherwise give the impression that I’ve been living under a rock.

Enabling and Enforcing HTTPS on a Subdomain with cPanel

Nowadays, there’s really no excuse not to enable HTTPS on a website, even a small personal one. It’s free and simple. In fact, chances are that whatever host you’re using offers a simple option you can just turn on. In this article, we’ll see how to set this up in cPanel, which is commonly used in Linux/PHP/MySQL web hosting services.

Set up the Subdomain

Subdomains service in cPanel

If you haven’t already, create a subdomain. To do this:

  1. Locate the Subdomains service in cPanel.
  2. Enter a name for the subdomain.
  3. Enter a path to a folder to be used as the document root for the subdomain.
  4. Click the Create button.

Enable HTTPS on the Subdomain

Let’s Encrypt™ SSL service in cPanel

New subdomains will by default run on HTTP, which is insecure. Enabling HTTPS requires an SSL or TLS certificate. To set this up:

  1. Locate the Let’s Encrypt™ SSL service in cPanel.
  2. Scroll towards the bottom of the page, and page through your subdomains until you locate the new one you want to apply HTTPS to.
  3. Click on the Issue action link next to it.
  4. Leave the settings as they are and click on the Issue button.

Enforce HTTPS on the Subdomain

Domains service in cPanel

Enabling HTTPS is only half good if people can still access the site insecurely over HTTP. It’s very easy to automatically redirect people from the HTTP endpoint to HTTPS. To do this:

  1. Locate the Domains service in cPanel.
  2. Locate the new subdomain, which may be on a different page.
  3. Turn on the switch in the Force HTTPS Redirect column.
  4. A success message should confirm that it’s been enabled.

Test the Subdomain

The subdomain is secure and running on HTTPS

To make sure everything is set up correctly, use a browser to ensure that the website at your subdomain is secure.

  1. Wait a few seconds. The redirect you just enabled might not kick in right away.
  2. Use an incognito session in your browser. Otherwise, if you visited the subdomain before enabling the redirect, it’s possible that the browser might still show it as insecure.
  3. Access your domain with the URL starting with https://. Ensure that your browser displays the padlock icon and reports the connection as secure.
  4. Access your domain with the URL starting with http://. Once the page loads, ensure that you are now on https:// and that the browser displays the padlock icon and reports the connection as secure. Optionally, you can also open your browser’s dev tools, switch to the Network tab, and observe a 301 redirect request.

Summary

As you can see, it’s super easy to get HTTPS working on a subdomain in cPanel. Just enable HTTPS for the subdomain, force the HTTPS redirect, and you’re done.

Azure Fundamentals Part 5 Summary

This is a summary of the Azure Fundamentals part 5: Describe identity, governance, privacy, and compliance features learning path. Aside from the usual “Introduction to Azure Fundamentals” module, repeated in every learning path in this series, there are three modules covering identity, cloud governance, and compliance, respectively. If you’re a developer, this learning path is easily the most boring of the lot, but it’s also very important from a cloud administration point of view.

Identity Services

This is a summary of the Secure access to your applications by using Azure identity services module. They love using wordy headings, don’t they?

Authentication vs authorization: who you are vs what you have access to.

Azure Active Directory (Azure AD):

  • Similar to Active Directory, but for the cloud
  • Monitors sign-in attempts, unlike the on-premises counterpart
  • Controls access to other Microsoft services such as Office 365
  • Has the concept of tenants, which represent organisations
  • Is an identity and access management service. It stores information about users (including passwords), and provides control over them (e.g. reset password, multifactor authentication, list of banned passwords, etc)
  • Also provides device management – devices can be registered to control which devices are allowed to access services.
  • Supports Single sign-on (SSO) to access multiple applications with the same credentials.
  • Azure AD Connect synchronises user identities between on-premises Active Directory and Azure AD. Users can use their same credentials to access on-premises and cloud services.

Multifactor authentication provides an additional layer of security over the usual username and password by requiring two or more authentication mechanisms, typically from the following categories:

  • Something the user knows (e.g. username and password)
  • Something the user has (e.g. code sent to mobile device)
  • Something the user is (biometric data, e.g. fingerprint)

Conditional access is a feature of Azure AD that applies multifactor authentication differently based on identity signals. This is basically a rule engine that can do things like request the second factor only if they’re in an unknown location, signing in from an unknown device, or accessing a particular application. Access could also be blocked entirely in some circumstances (e.g. signing in from a high-risk country). Conditional access is a premium feature that requires a special Azure AD licence.

Cloud Governance

This is a summary of the Build a cloud governance strategy on Azure module.

The Cloud Adoption Framework for Azure guides you towards migrating to the cloud. There are five steps:

  • Define your strategy: understand what benefits you’ll gain by moving to the cloud, get everyone on board, and choose the right proof of concept project to kick it off.
  • Make a plan: take stock of what you have on-premises, train up, and make a plan to migrate.
  • Ready your organisation: set up your Azure subscriptions and create a landing zone, basically an environment in the cloud to get you started.
  • Adopt the cloud: start migrating, review best practices, find ways to migrate more efficiently, and study ways to handle more complex migrations.
  • Govern and manage your cloud environments: define processes and policies that will apply to resources in the cloud, and maintain them as they evolve throughout the migration process.

Things to consider when deciding how to organise Azure subscriptions:

  • BIlling: you can create one billing report per subscription, so you can organise subscriptions by department or project.
  • Access control: subscriptions provide inherent isolation (e.g. between development and production environments).
  • Subscription limits: some resources are limited in the amount you can deploy per subscription, so you’ll need to allocate more subscriptions if necessary.

Role-based access control (RBAC) is used to grant or restrict access to resources. These roles are applied to a scope that could be:

  • A management group
  • A single subscription
  • A resource group
  • A single resource

Access control is inherited by child scopes, e.g. assigning a role to a single subscription means it is also applied to all resource groups and resources in that subscription.

RBAC is managed via Access control (IAM) in the Azure portal. RBAC rules are applied to any request to an Azure resource that passes through the Azure Resource Manager.

RBAC uses an allow model, so as long as you have a role that allows you to perform an action, you can do it; and if different roles give you different access (e.g. read and write), then they sum up (e.g. you get both read and write).

Resource locks are a simple setting against accidental modification or deletion. You can use either CanNotDelete (authorised users can read or write but not delete) or ReadOnly (authorised users can read a resource but can’t change or delete it). You can remove the lock to perform the restricted operation (e.g. to delete the resource).

You can use Azure Blueprints (more on this further below) to set a standard for resources across your organisation, which could include enforcement of resource locks among other things.

Resource tags are used to apply metadata to resources. They complement subscriptions and resource groups as another way to categorise and organise things. They help to:

  • Manage resources and locate them easily
  • Report on costs by particular tags
  • Group resources based on criticality and SLAs
  • Classify data security (e.g. confidential)
  • Regulatory compliance (e.g. ISO27001)
  • Run any kind of automation logic on resources with a particular tag

Azure Policy lets you create and enforce policies or initiatives (groups of policies) that apply to resources. To implement a policy, you:

  1. Create a policy definition
  2. Assign it to resources
  3. Review the evaluation results

A policy definition can be used to do things like:

  • Prevent VMs from being deployed in certain regions
  • Restrict which virtual machine sizes can be deployed
  • Enforce MFA on accounts with write permissions
  • Prevent CORS from allowing unrestricted access to web applications
  • Ensure updates are installed on VMs

Azure Blueprints lets you orchestrate things like role assignments, policy assignments, ARM templates and resource groups across your organisation so that you don’t need to set them up for each subscription. Blueprints are made up of artifacts, and they deploy different elements to each subscription (e.g. Allowed locations policy, resources from an ARM template, etc).

Data Protection & Compliance

This is a summary of the Examine privacy, compliance, and data protection standards on Azure module.

Some projects require compliance with certain standards, such as ISO 27001 or government-specific regulations. Azure is compliant with a huge number of these, so it’s quite likely you can use Azure even when working in some of the more regulated sectors.

You can also check the following documents:

  • Microsoft Privacy Statement: how Microsoft manages personal data
  • Online Services Terms: agreement between customer and Microsoft when using services such as Azure or Office 365
  • Data Protection Addendum: more specific about data protection

The Trust Center lets you find information about particular compliance offerings, such as ISO 27001, and how it applies to cloud services on Azure.

The Azure compliance documentation describes how Azure adheres to certain standards, e.g. PCI DSS.

Azure Government is a separate Azure offering for US government. It has the highest level of security, and data centres are physically isolated so they can’t be used by you and me outside the scope of the US government.

Azure China 21Vianet is the Azure offering in China. Microsoft can’t operate Azure directly in China because of local regulations, so they instead offer it via a partner, 21Vianet. Services offered are mostly the same, but they may vary a little.

Azure Fundamentals Part 4 Summary

This is a summary of the Azure Fundamentals part 4: Describe general security and network security features learning path. Aside from the usual “Introduction to Azure fundamentals” module repeated in every learning path in the series, there are only a couple of other modules on general and network security, respectively.

General Security

This is a summary of the Protect against security threats on Azure module.

Azure Security Center is a service that gives you visibility into the overall security of your Azure and on-premises services, referred to as your security posture. It provides ratings against different regulatory benchmarks such as Azure CIS or PCI DSS, and also provides an overall secure score. The Resource security hygiene section provides a breakdown of security warnings by service type.

Azure Security Center also provides additional security capabilities including:

  • Permitting temporary access to VMs that would normally be blocked to outside traffic
  • Controlling which applications can run on VMs
  • Recommendations for hardening network security groups
  • Monitoring system files on both Windows and Linux against tampering
  • Integration with Azure Logic Apps to automatically trigger actions based on threat detection alerts of Security Center recommendations.

Azure Sentinel is a security analytics service (the more formal term would be security information and event management (SIEM) system). It can:

  • Collect security information from different sources
    • Microsoft services such as Office 365 or Azure Active Directory
    • Non-Microsoft services such as AWS CloudTrail or Okta SSO
    • Other sources that use recognised formats including Common Event Format (CEF), Syslog, or REST API
  • Detect threats based on built-in or custom rules
  • Investigate incidents or suspicious activity
  • Use Azure Monitor Workbooks to automate responses to threats

Azure Key Vault is another security-related service used to store secrets, including passwords, encryption keys, and certificates. These secrets can also be protected by hardware security modules (HSMs). Access to the secrets can be easily monitored.

Azure Dedicated Host is a special VM offering where you have sole access to the physical hardware (as opposed to normal VMs which are shared). This can sometimes be required for compliance reasons.

  • A host group contains multiple dedicated hosts for high availability, similar to VM scale sets.
  • Maintenance control provides control over when regular maintenance updates occur, within a 35-day rolling window.
  • Pricing is per dedicated host, not per VM running on it. Additional charges apply for software licencing, storage, and network usage.

Network Security

This is a summary of the Secure network connectivity on Azure module.

Defence in depth refers to multiple layers of defence including:

  • Physical security: physical access to the data centre.
  • Identity & access: control access to infrastructure and change control. This includes use of SSO and multifactor authentication, as well as auditing events and changes.
  • Perimeter: DDoS protection and perimeter firewalls.
  • Network: use access control to limit communication between resources, and ensure any external connectivity (e.g. to on-premises networks) is secure.
  • Compute: secure access to VMs and ensure they have the latest security updates.
  • Application: ensure applications are free of vulnerabilities, and store secrets securely.
  • Data: store and transmit data securely, whether it’s in a database, VM disk, SaaS application (e.g. Office 365) or in other cloud storage.

Data protection is based on the CIA principles:

  • Confidentiality: Use the principle of least privilege to give access only to those who really need it. Protect secrets and resources from unauthorised access.
  • Integrity: Protect data at rest and in transit from tampering. Hash algorithms are usually used to verify whether data has changed.
  • Availability: Ensure services are able to run and that access to their data is not compromised, e.g. by DDoS attacks.

Azure Firewall is a highly available and scalable stateful firewall used to protect resources within virtual networks. It can be configured to allow or deny traffic based on rules including:

  • Source IP address
  • Protocol
  • Destination port
  • Destination address
  • Which domains can be accessed from a subnet

Network Address Translation (NAT) rules can also be configured in Azure Firewall.

Azure Application Gateway, Azure Front Door and Azure Content Delivery Network offer a different kind of firewall known as web application firewall (WAF), which provides protection tailored to web applications.

Azure DDoS Protection resists attempts to overwhelm or overallocate resources by flooding them with requests. This is available in two tiers:

  • Basic: free and automatically enabled. The Azure global network is used to distribute and mitigate attack traffic across Azure regions; it ensures that Azure infrastructure is not affected by DDoS attacks. Includes always-on traffic monitoring and real-time mitigation of common network-level attacks.
  • Standard: provides additional protection for virtual network resources linked to public IP addresses. Adapts mitigation measures via dedicated traffic monitoring and machine learning algorithms.

DDoS Protection can help prevent the following types of attacks:

  • Volumetric attacks: flood the network layer with requests.
  • Protocol attacks: exploit weaknesses in layer 3 or 4 protocols.
  • Resource/application-layer attacks (only with web application firewall): target HTTP endpoints that are relatively slow to process, so many such requests ultimately overwhelm the server and make it unable to process additional requests. This requires the HTTP-aware WAF to mitigate.

Network security groups (NSGs) are like internal firewalls. Whereas Azure Firewall controls what traffic comes from outside, NSGs can be used to allow or deny traffic between resources in a virtual network, based on things like source/destination IP (single address or range), protocol (TCP, UDP or both) and direction (incoming or outgoing traffic).

"You don't learn to walk by following rules. You learn by doing, and by falling over." — Richard Branson