PhantomJS is a minimalistic, headless, WebKit-based, JavaScript-driven tool

gojomo · on Jan 26, 2011

Cool!

But, have security issues been considered?

It looks like a Javascript thread-of-execution that can write to local filesystem paths (to render graphics, at least) may call into network-loaded DOM/code. Is there any assurance that page contents can't discover and use phantom API operations? (Or perhaps read local 'file:' URIs?)

kj12345 · on Jan 26, 2011

Wow, that's a good insight, seems like a possible attack vector. Maybe something like Adobe Air's security model of putting local access and network access in different iframes with a message-passing API between them would work. That's always felt like a bit of a hack to me but at least the separation between different frames in WebKit has been well-tested.

andrewf · on Jan 27, 2011

GreaseMonkey has to contend with these same issues. It has a security model, which will be broken if developers ignore the list of things you can't do.

http://wiki.greasespot.net/Security

wccrawford · on Jan 26, 2011

I wasn't excited about this until I realized it could be used to render a webpage to an image with a headless browser. I'm pretty happy about that.

Edit: The code for that is on http://code.google.com/p/phantomjs/wiki/QuickStart

irskep · on Jan 26, 2011

Yep, now I can finally write that Webkit-based book layout software I've been wanting!

irskep · on Jan 26, 2011

Wow, I've never had to do this before...could the downvoter explain him/herself?

hinathan · on Jan 26, 2011

This is really slick. It's tempting to think of this as a potential route towards headless functional testing — probably no substitute for something as heavy as selenium but for light tasks and DOM-based inspection of returned payloads (and screenshots, baked in!) it seems like a plausible foundation.

ehsanul · on Jan 26, 2011

Also note these other headless javascript testing frameworks:

Zombie.js: http://news.ycombinator.com/item?id=2038663

HTMLUnit: http://htmlunit.sourceforge.net/

Celerity (JRuby wrapper around HTMLUnit): http://celerity.rubyforge.org/

xtacy · on Jan 26, 2011

I think the main differentiating factor between phantomjs and the above mentioned ones is that phantomjs actually uses WebKit and has full support for everything that WebKit supports. Zombie.js uses the DOM API due to jsdom, which might have its own parsing intricacies.

regularfry · on Jan 26, 2011

HTMLUnit has its own share of parsing gotchas.

I've used Selenium with Firefox and Xvfb to do headless scraping in the past; looks like my toolkit just got simpler.

endtime · on Jan 26, 2011

I don't know that any of those support SVG. If you want to render e.g. Raphael.JS drawings on the server, headless JS with SVG DOM support is pretty useful.

Torn · on Jan 26, 2011

That actually sounds like a really neat idea - throwing together a PhantomJS script that scrapes the content of a particular div containing Raphael or Flot and then renders as png.

endtime · on Jan 26, 2011

Yeah, we're doing it with wkhtmltoimage right now, which works but makes me cringe every time I think about it.

pak · on Jan 26, 2011

I think the main difference for me between this and Selenium would be that Selenium offers some macros to click an element, or simulate a drag and drop between two elements. If you added those in here--wouldn't be too much more, I don't think--you'd have a competitive testing framework, albeit Webkit-only.

adatta02 · on Jan 26, 2011

Assuming you could get jQuery to run inside this it should be pretty straight forward to translate the Selenium IDE's Selenese commands into jQuery statements.

I'm going to have to give this a run tomorrow in the AM.

helper · on Jan 26, 2011

I'm sorry but I don't consider an application that must be run under a windowing environment to be "headless".

While this cool, its really not that different from projects like http://code.google.com/p/wkhtmltopdf/.

If you really want a headless webkit browser you would need to write a new webkit port to a graphics library that doesn't require a windowing system (maybe cairo).

andrewf · on Jan 27, 2011

Strictly speaking all this needs is a working QtGui component in the Qt library.

There are a couple of ways to compile Qt to operate against an in-memory framebuffer rather than a "real" windowing environment, although neither is a well supported part of modern Qt:

http://qt.gitorious.org/qt/pages/GettingStartedWithLighthous...

http://doc.qt.nokia.com/qtextended4.4/qvfb.html

epochwolf · on Jan 26, 2011

What's the cost of running an Xserver in the background? 20~30mb?

blago · on Jan 26, 2011

What happens when Xserver is not installed and you don't have the permissions or don't want to install it?

ivansavz · on Jan 26, 2011

the announcement blog post: http://ariya.blogspot.com/2011/01/phantomjs-minimalistic-hea...

xtacy · on Jan 26, 2011

Brilliant! "Just 250 lines of Qt and C++" shows how good Qt/WebKit are.

olalonde · on Jan 26, 2011

What is meant by "headless"?

sophiebits · on Jan 26, 2011

Presumably you don't see the WebKit window or anything like that; it runs invisibly and doesn't require any window server.

regularfry · on Jan 26, 2011

You don't need a desktop environment, so it can run without ancilliary support on a server.

jontas · on Jan 26, 2011

Anyone know if this could be used for generating heatmaps? I'd basically need to identify the x,y offsets of elements on a page. I realize that these can be effected by the browser's width/height, but I'm hoping I can set those to generate the data.

smilliken · on Jan 26, 2011

This might be useful for html sanitization. You can allow anything as input (including scripts, styles, etc), render it to a page in phantom, apply your whitelist on the effective DOM, and render it out as output. Of course, this might be resource intensive, and you have to be careful about phantomjs being sandboxed and having cpu/memory/timeout limits. The nice thing about this though is you that 1) you get your serializer/deserializer for free, 2) it's very forgiving on malformed input, 3) the output is WC3 valid since webkit corrects the DOM, and 4) you can support styles and scripts that affect the DOM.

ollysb · on Jan 26, 2011

Have been trying to find a solution to headless js testing in cucumber that doesn't suck or need java. A driver built on top of this would rock!

rb2k_ · on Jan 26, 2011

In case somebody wants an OSX binary but doesn't want to download the development tools and qt: http://blog.marc-seeger.de/2011/01/26/phantomjs_osx_binary

(Indirection over my blog in case I need to switch the file away from my webspace)

rb2k_ · on Jan 26, 2011

Ignore this, I thought it compiled a statically linked version... it didn't and I really suck when it comes to compiling c :(

mnutt · on Jan 27, 2011

Yeah, I had a really hard time getting qtwebkit to statically link on a mac as well. I think it may not be possible.

andrewf · on Jan 27, 2011

Not possible according to the docs: http://doc.qt.nokia.com/4.7/developing-on-mac.html#building-...

gregwebs · on Jan 26, 2011

I would really like to see a comparison between this and other options. (rhino, envjs, htmlunit).

blago · on Jan 26, 2011

They are completely different things. Rhino is a JS engine. envjs as a script that creates a mock window object and can run in an engine like Rhino. Htmlunit comes close, implemented in Java, tries to SIMULATE some popular browsers. PhantomJS is... Webkit.

blago · on Jan 26, 2011

Not headless, many other projects, many other ways to do the same even without programming/compiling. I'm still longing to see a true headless browser that renders to Cairo or something else.

tworats · on Jan 26, 2011

The first two applications that come to mind are automated testing and screen scraping, I'm a little surprised they didn't include examples of those. Looks interesting though.

buddydvd · on Jan 26, 2011

Automated Testing: http://code.google.com/p/phantomjs/wiki/ServiceIntegration#J...

Screen Scraping: http://code.google.com/p/phantomjs/wiki/QuickStart#Rendering

z92 · on Jan 26, 2011

I can't find the example which generates PDF of the wikipedia page, as described on the front page.

Knacker_Hughes · on Jan 26, 2011

It's on http://code.google.com/p/phantomjs/wiki/QuickStart about three quarters of the way down the page:-

phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printa... jakarta.pdf

wicknicks · on Jan 26, 2011

How is this different from node.js?

rb2k_ · on Jan 26, 2011

This is a headless browser using webkit as a rendering engine.

Node.js is an asynchronous I/O framework

wicknicks · on Jan 26, 2011

This is great indeed! I just tried it out. For people who are running Ubuntu 10.10, and have Qt < v4.7 can change line 34 and comment line 164 (to atleast test out the examples on the website).

retube · on Jan 26, 2011

So this is a headless browser with a javascript api? Sweet.

toisanji · on Jan 26, 2011

I'm going to try this to do some web page scraping.