The project I’m supposed to be working on is going to have a large focus on XSLT. So a while ago I looked into it and had a play around.
Then did some XSLT related to the project. All was good.
But you never really learn anything unless you use it for your own requirements. So I thought: “html should be valid xml. I should be able to apply XSLT to a html page and get some values out of it”. The wiki we use at work supports an xslt tag and the page I was going to target was bash.orgs latest page. On my first few attempts, I only got errors.
So I simplified it to use a working example (just to make sure that the XSLT engine was working in the wiki); all was good.
I changed the XSLT to be as simple as possible. still good.
I then changed the source back to bash.org and it failed. Turns out that the site is invalid. Pitty.
I have a feeling that the web will change and become quite interesting when everyone adheres to well formed html.
Anyway, to the interesting point.
At one point, I was parsing an XSLT document as the source of information (since XSLT is well formed XML anyway).
It got me thinking. Could you ever write an XSLT transformer with XSLT? (probably not)
But the funny thing I thought of, the first bug tracking system. If it was done in an agile fashion, version 0.0.1 would only support creating bugs, and the bug list would look something like
* Cannot edit bugs
* Cannot resolve bugs
* Cannot assign bugs to person
* Cannot tell who created bugs
* Cannot distinguish between bug and feature request
ah, distractions. Probably should get back to work.
-= Comments
1. Mark | November 16th, 2007 at 11:03 am
“html should be valid xml. I should be able to apply XSLT to a html page and get some values out of it”
Valid XHTML is XML and you can apply transformations, but HTML isn’t well formed.
“I have a feeling that the web will change and become quite interesting when everyone adheres to well formed html.”
I don’t think we’re going to get there any time soon, when XHTML is actually supported in browsers I forsee a big shift in the way markup is generated and used, XHTML must be valid or it just dies. So people won’t be typing long strings of XHTML by hand but rather parse it with other proccesses that ensure valid, well structured markup.
It’s hard too see where we’ll end up though, while Web Standards is gaining ground - so are frameworks where the bottom line is productivity, and web standards isn’t on their prioirty list. I think we’re going to lose of people to the Flex, good luck parsing that! ASP.NET encourages you not to work with / think about the generated markup, the controls are that abstracted. This is another reason I’m really enjoying Rails at the moment, it generates clean markup, and it encourages best practice in the front-end coding.
–
But you can still use DOM to walk the tree of a document, even if it isn’t completely valid. Thinking about it you should be able to request a page with HttpRequest and then traverse that with DOM..maybe
xhr.open(’GET’, ‘http://www.bash.org/?latest’, false);
… do things..
var content = document.createElement(’div’);
content.innerHTML = xhr.responseText;
// Get all links In page
var linksInBashLatest = content.getElementsByTagName(’a');
It’s not as efficient or as nice as XSL but if you need to get parts of the document you should be able to it with javascript..perhaps
No comments:
Post a Comment