This post is the second part of tutorials about MEAN. Today we will see about crawler logic with cheerio (node lib).
In app.js declare crawler class:
var crawler = require('./routes/crawler.js');
After we set one timer for get source:
setInterval(function(){ console.log('trying crawler in:' + new Date()); download(options, function downloadResult(data) { if (data) { findKeywordsAndusers(data, parseHtml); } else console.log("error"); }); }, 1 * 60 * 1000);The code above will call function download and in call back return will call function findKeywordsAndusers with function callback parseHtml in parameter.
The main logic in function parseHtml:
#load html download from function download var $ = cheerio.load(data); #iterate through each element 'li' and get element in tag 'h3' with link $("li").each(function(i, e) { var title = $(e).find("h3>a").text(); }After get the element what we need, the rest of logic is about retrieve keywords for match with title and send e-mail for alert.