<pedrocorreia.net ⁄>
 

<Pure JavaScript HTML Parser ⁄ >




clicks: 1290 1290 2008-05-06 2008-05-06 goto programacao myNews programacao  Bookmark This Bookmark This


Recently I was having a little bit of fun and decided to go about writing a pure JavaScript HTML parser. Some might remember my one project, env.js, which ported the native browser JavaScript features to the server-side (powered by Rhino). One thing that was lacking from that project was an HTML parser (it parsed strict XML only).

I've been toying with the ability to port env.js to other platforms (Spidermonkey derivatives and the ECMAScript 4 Reference Implementation) and if I were to do so I would need an HTML parser. Because of this fact it became easiest to just write an HTML parser in pure JavaScript.

I did some digging to see what people had previously built, but the landscape was pretty bleak. The only one that I could find was one made by Erik Arvidsson - a simple SAX-style HTML parser. Considering that this contained only the most basic parsing - and none of the actual, complicated, HTML logic there was still a lot of work left to be done.

(I also contemplated porting the HTML 5 parser, wholesale, but that seemed like a herculean effort.)

However, the result is one that I'm quite pleased with. It won't match the compliance of html5lib, nor the speed of a pure XML parser, but it's able to get the job done with little fuss - while still being highly portable.



este é só um excerto do artigo, para aceder ao artigo completo, clique no link em baixo:
this is just a small excerpt from the article, to access the full article please click in the link below:

http://ejohn.org/blog/pure-javascript-html-parser/




Subscribe News RSS  Subscribe News Updates by E-mail





myNews <myNews show="rand" cat="programacao" ⁄>

RouterJs: easy routing for your ajax Web applications new ...

RouterJs is a simple router for your ajax web apps. It's build upon History.js which means that Rout (...)

clicks: 2720 2720 2012-05-14 2012-05-14 goto url (new window) haithembelhaj.g... goto myNews programacao


Backbone computed properties new ...

This gist shows one way to implement read- and write-enabled computed properties on a Backbone Model (...)

clicks: 2666 2666 2012-05-13 2012-05-13 goto url (new window) https://gist.gi... goto myNews programacao


Android Query new ...

Android-Query (AQuery) is a light-weight library for doing asynchronous tasks and manipulating UI el (...)

clicks: 2494 2494 2012-05-12 2012-05-12 goto url (new window) code.google.com... goto myNews programacao


Create Instagram Filters With PHP new ...

In this tutorial, I'll demonstrate how to create vintage (just like Instagram does) photos with PHP (...)

clicks: 2352 2352 2012-05-12 2012-05-12 goto url (new window) net.tutsplus.co... goto myNews programacao


HTML5 jQuery Paint Plugin new ...

Websanova Paint is a HTML5 canvas based jQuery plugin. It allows you to free paint on a canvas area (...)

clicks: 6498 6498 2012-05-12 2012-05-12 goto url (new window) websanova.com/t... goto myNews programacao


Sass vs. LESS vs. Stylus: Preprocessor Shootout new ...

CSS3 preprocessors are languages written for the sole purpose of adding cool, inventive features to (...)

clicks: 2253 2253 2012-05-11 2012-05-11 goto url (new window) net.tutsplus.co... goto myNews programacao


Real-time Applications With Node.js and Socket.IO new ...

Hey everyone! Sorry about the long pause since the last blog post, life has been quite hectic for th (...)

clicks: 2486 2486 2012-05-11 2012-05-11 goto url (new window) codingcookies.c... goto myNews programacao


15 Handpicked jQuery Drop Down Menus Tutorials new ...

Here we are presenting another brilliant collection of 15 jQuery navigation menu that you can downlo (...)

clicks: 2117 2117 2012-05-10 2012-05-10 goto url (new window) smashingapps.co... goto myNews programacao