When you look around on security mailing lists you’ll probably an increase in security warnings relating to web applications… many of them based on JS code injected into a webpage.
This has lead to the uncomfortable situation where pages that are based on usercontent can not trust their users to provide JS as part of their submitted content. So now we can share video, audio and other passive media but anything interactive is out of the question.
What to do about it? The JS security system is entirely based on domain names and some providers have resorted to running all user js code on a seperate domain… but this again limits the usefulness of JS because it can only operate within the assigned iFrame. Others are trying to run the JS code through code analysis tools to find out if it is doing anything “forbidden”.
But who are we kidding? Blacklist attempts have never worked so far and the thing about web security is that even a single attack can leave data from dozens of apps exposed.
The alternative is quite simple, but to my best knowledge has never been tried: Implementing a second language in JS, running protected in a seperate sandbox, allowing only whitelisted calls and if necessary filtering the results. Is this possible? Certainly? Is it hard? Not as hard as one would imagine? Is it slow? Definately slower than true JS but still fast enough to be of use.
Let’s tackle these questions one by one:
Is it possible? Every language that can implement basic text parsing can implement it’s own parser… it’s really as simple as that. And it JS it’s even easier because we have a bunch of text processing tools like RegularExpressions that make parsing quite straight-forward and simple.
Is it hard? Not really… many of the requirements for the interpreted language can be mapped to native behaviour. For example: the garbage collector can work for the interpreted language as well if we map stacks and variables in the interpreted language back to native objects.
Is it slow? In order to answer this question we have to remember how code is usually stored in high level languages: The CodeDOM. The codedom is a simple, object-based tree structure where any number of atoms make up expressions. Once we have parsed the expressions into this DOM and inserted all implicit behaviour, executing code is really just a matter of walking this tree. So each interpreted operation means running the atom handler and following the tree. The atom handlers usually don’t change and can therefore be compiled by the JS handler and the jump to the next atom is just following a single reference. Combine that with the fact that we can replace known atom combinations with optimized functions and you’ll see that this is fast enough for the majority of simple web apps.
Just think about it what people could do if their apps were not restricted to their iFrames… youTubeOS? mySpace dynamic layouts? The sky would be the limit (That and the rules inserted into the interpreter… mySpace could opt to give users full access over the page’s elements, but not their ads and not the document and window elements).