<mySearch ⁄>
<myBlog show="last" ⁄>
<myPhoto order="random" ⁄>
<mySnippets order="rand" ⁄>
<mySnippets type="lang" ⁄>
<myQuote order="random" ⁄>Não penses em vencer, pensa em não ser vencido.
<myContacts ⁄><email ⁄>
<windows live messenger ⁄>
<myCurriculum type="pdf" ⁄>
<myVisitorsMap ⁄>Recently I was having a little bit of fun and decided to go about writing a pure JavaScript HTML parser. Some might remember my one project, env.js, which ported the native browser JavaScript features to the server-side (powered by Rhino). One thing that was lacking from that project was an HTML parser (it parsed strict XML only).
I've been toying with the ability to port env.js to other platforms (Spidermonkey derivatives and the ECMAScript 4 Reference Implementation) and if I were to do so I would need an HTML parser. Because of this fact it became easiest to just write an HTML parser in pure JavaScript.
I did some digging to see what people had previously built, but the landscape was pretty bleak. The only one that I could find was one made by Erik Arvidsson - a simple SAX-style HTML parser. Considering that this contained only the most basic parsing - and none of the actual, complicated, HTML logic there was still a lot of work left to be done.
(I also contemplated porting the HTML 5 parser, wholesale, but that seemed like a herculean effort.)
However, the result is one that I'm quite pleased with. It won't match the compliance of html5lib, nor the speed of a pure XML parser, but it's able to get the job done with little fuss - while still being highly portable.
este é só um excerto do artigo, para aceder ao artigo completo, clique no link em baixo:
this is just a small excerpt from the article, to access the full article please click in the link below:
http://ejohn.org/blog/pure-javascript-html-parser/
<myNews show="rand" cat="programacao" ⁄>