Using jQuery in Node with jsdom

After having watched a ton of Node.js tutorials (and TAing for a JS class), I decided a while ago “for my next script, I’m totally going to use Node.”

So I finally got the opportunity this last week to write a script. Tasked with a menial job, making a script to accomplish it brightened my day.

The first script was dealing with an xml api feed. So I immediately found xml2js, a nice converter and set about looping through some api urls, collecting the data I needed and totaling it up. It was a mess, and looked like this:

var https = require("https");
var parseString = require('xml2js').parseString;

https.get("https://someplace/someapi", function(response){
 
	var body = '';
	response.on("data", function(chunk) {
		body += chunk;
	});
	
	response.on("end", function(){
		//console.log(body);
		parseString(body, function (err, result) {
			totalEntries += result.feed.entry.length;
			for(var i=0; i < result.feed.entry.length; i++){
				something += parseInt(result.feed.entry[i]['something'][0]['somethingelse'][0].$.thingiwant);
			}
			console.log("Total stuff: " + something);									
		});	
	});
}

This one was easy to get what I needed, but clearly not the right way to do it. Because the functions happen asynchronously, blah blah blah, that’s not what I’m writing about.

The next one was very similar, but I had to scrape a webpage, not just xml data. So I found a nice lib called jsdom, which created a dom for me to use jquery on.

var jsdom = require("jsdom");
 
jsdom.env(url, function(errors, window){
	var $ = require("jquery")(window);
	var total = 0;
	
	$(".some_class").each(function(key, value){
		// just use a regex to get it
		// it's buried in the onclick, so I'll have to use a regex regardless...
		var result = value.innerHTML.match(/newWindow\('([^']*)'/)[1]; // get first grouping
		jsdom.env(host + result, function(errors, window){
			var $ = require("jquery")(window);
			// use regex to get the xxxxxxx because I'm lazy
			var result = $('head').html().match(/someRegex/g);
			if(result !== null){
				for(var i = 0; i < result.length; i++){
					var thing = result[i].match(/"([^"]*)"/)[1]; // get first grouping
					total += thing;
				}
			}			
		});
	});
});

This was super easy / super powerful to use something I’m already so familiar with to accomplish a task that is well suited to that. The scripts themselves took minutes to write — if you don’t take into account the time I spent finding where to get what I needed.

1 Comment

  1. JaZahn says:

    Yes. Phantom would have been a better choice.

Leave a comment