What I Learned from Watching Notch Code

AndyKelley · on Sept 14, 2011

"Here's the most important bit: Notch's testing is mind-bogglingly thorough."

For the record, this is completely contrary to his testing method for Minecraft. Every time there is a new version of Minecraft, there are more bugs than features. This is even true when a "bugfix" release comes out. More bugs are introduced in a "bugfix" release than are fixed. Seriously, check it out: http://www.minecraftwiki.net/wiki/Version_history

I say this as the owner of a library to provide bot API to Minecraft. https://github.com/superjoe30/mineflayer

All I'm saying is... "thoroughness" and "testing" are not two words I expected to be in the same sentence as Notch's methodology.

regularfry · on Sept 14, 2011

Maybe he's learnt.

Dylan16807 · on Sept 14, 2011

The pre-1.8 release less than a week ago had an obvious and crippling bug. An experience system was added, and when you died you dropped your experience. But instead of doing so as a single object, it made thousands of objects each holding one single XP point. This made the game completely unplayable if you were in the same area, and even crippled multiplayer servers for people not in the area.

Also, while that is the most recent glaring bug, in the past there have been multiple occasions where a release broke trees so that the leaves didn't decay after the trunk was removed. Punching down a tree is something you do in the first 30 seconds of a normal minecraft game.

tjogin · on Sept 14, 2011

I've reflected on this in the past as well. It seems to me Notch doesn't do any kind of automated testing what so ever. Some of the bugs would have been easy to discover with simple unit tests. The game is great and all, but he does seem to have a shoot-from-the-hip approach to development.

ugh · on Sept 14, 2011

It was a pre-release. Also: leaves not decaying was the intended behavior at the time, it wasn’t a bug.

But that’s not really the point. There is no way to test Minecraft thoroughly if you are pretty much on your own. It has too many features. Minecraft is too complex for this method to work.

Not that it matters, really.

AndyKelley · on Sept 14, 2011

I disagree with this. Using my bot framework I actually began writing a test suite. I stopped because it wouldn't actually have been very helpful to what I was doing due to various reasons, but I progressed enough to know it was very doable.

chrisrhoden · on Sept 13, 2011

moved from the blog comments section:

Woah, man. Woah.

I think you're missing the primary benefit here, which is that he avoids regressions. By the time you are adding feature 256, you have 255 features which are going to possibly be affected by the code changes you introduce. This is a huge part of software engineering, and one that has gotten lots of attention over the years.

The process you describe for your development sounds extremely tedious, but probably won't break down until month 2 of development on a team of one. Once you reach a level of complexity beyond this, that's where automated testing proves it's salt.

The style of testing you're describing is commonly referred to as integration testing or acceptance testing, because it is designed to test the full stack in harmony. There are a number of great frameworks out there to help you do this. Cucumber is the one that's gotten the most love in the circles I am familiar with. You can write your steps in python or javascript, so don't worry that it's written in Ruby.

The typical thing to do once you've started doing automated testing is to actually write your tests first, watch them fail, then write the code to make the test pass. This forces you to ensure you have good test coverage (every feature is tested) and has been shown to result in better designed systems.

You have tons of reading to do if you want to learn more about this, but hit me up if you want a basic rundown.

genbattle · on Sept 14, 2011

I think games are hard (if not impossible) to test with unit tests or codified tests like the ones you mentioned.

At the company I work for we rely mostly on manual testing for testing the games and applications we make. We do some automated testing in the form of Tinytask tasks that are left on overnight to hammer an application. In terms of release testing, especially with a custom game engine, there's no easy way to codify a test which actually plays through a game where there is some random mechanic, or checks for things like the visual accuracy, windows flashing up on the screen when they're not supposed to, etc.

Web development frameworks like Selenium are great for UI testing, but they require identifying interface elements by their IDs, or performing a 100% image recognition match for targeting. If anyone knows of a good UI testing framework that would work for games/DirectX apps, I'd definitely love to hear about it.

I'm not bagging automated testing, I think it's a great way to save time and avoid the boring bits. I just don't think it's very relevant to game design, but for the OP's enterprise app job it would definitely be suitable.

chipsy · on Sept 14, 2011

I think the #1 reason why games don't need a lot of automated testing is serialization, or rather the common lack of it in most game worlds. If the data gets corrupt, you can reset the game and all is well again. When the rare corner cases pop up, it's easy to play whack-a-mole with them, because they're all "shallow" in the sense of the game starting from a clean reset frequently, so you can usually reproduce the bug quickly.

Of course, if the game does need heavy-duty serialization of everything, all bugs are potentially deadly. And this is largely the case for the line-of-business, social, and productivity apps, because that data is considered mission-critical. Corruption is not OK, reverting to backups is to be avoided. When a game has that kind of requirement, things get a lot tougher - and so games have naturally evolved to favor minimizing the save data to nothing or a few stats.

And this is borne out by looking for contrapositives. There are two genres that have a history of tending to be buggy because of some form of long-term data corruption: Large-scope RPGs and turn-based strategy games. It's hard to reach the bugs in those games, so it's also hard to fix them.

genbattle · on Sept 14, 2011

This is the difficulty for the company I work for. We make a range of desktop applications including launchers, productivity apps, media apps, games, etc.

Because all of these apps are based on a game engine, we get the complex and hard to test bugs that games get (texture cache corruption, null pointers in the scene tree, etc.). There's much more serialization and pressure than with pure games (but still not as much as with enterprise applications) because we're dealing with transactions with 3rd party services, or because a screw up could corrupt a user's whole music collection, or because users don't expect their music player to crash every 4 hours.

The game engine adds a huge amount of extra variability to our applications; we have to not only watch out for obscure bugs in our custom script code (which the engine silently tries/fails to execute anyway), but also in the (C/C++ based) game engine it is driving. The upside to using an engine with a simple scripting language is the shorter development times to get things off the ground and the high-performance/shiny visuals, but I feel it costs us much more down the line in terms of stability and extensibility.

palish · on Sept 14, 2011

Wait, wait... "texture cache corruption", and "null pointers in the scene tree"?

Edit: It's both scary and awesome that you rigged a game engine to manage a music library a-la iTunes. Are you profitable?

genbattle · on Sept 14, 2011

I'm not sure I would be allowed/qualified to say much about our financials. From what I can gather we are profitable, with ongoing contracts for clients such as Dell.

http://www.unlimitedrealities.com/blog/video-the-dell-stage-...

Those issues I mentioned are examples of things that we have actually run into.

edit: Our specialization in touchscreen development is really what drives the company, but hardware accelerated graphics are also a huge draw.

spoondan · on Sept 14, 2011

At the risk of going too far off topic and without intending to be too harsh: at least in the video, that interface looks nice, but there is too much lag between when a tap/gesture is made and when the interface reflects it. This breaks the direct manipulation metaphor and is, I think, a showstopper. The pinch to zoom example was particularly off-putting: how can I know how much I'm zooming when the interface doesn't track my gesture? I cannot imagine being happy with pinching, waiting a second to see what's happened, then pinching and waiting to adjust (repeat until I get it right or am too frustrated).

genbattle · on Sept 14, 2011

I don't think there's too much danger in going off-topic if the conversation is interesting and has some substance.

Yea I think lag in touchscreen interaction is a huge problem. I think this is largely a result of hardware. We've had to ship on atom-based hardware with 2003-era DX9 graphics. Even the latest Intel chips with DX10 are horridly slow when it comes to graphics.

The other side is actual touchscreen hardware. The machine used in the video is an HP Touchsmart, which uses an optical touchscreen panel. These panels have a latency of around 100-200ms, and that's before we even start processing the touch event information. Capacitive touch sensors are much better, but the're expensive to manufacture above about 10 inches.

In the end lag/accuracy is a reality of the low-cost hardware OEMs use, and there will always be some trade-off between hardware cost and performance.

stonemetal · on Sept 14, 2011

I am going to have to disagree with you on the unit test part. Everything in a game that is not graphics has no inherent limitation that prevents automated unit testing. A large part of your graphics pipe line can be unit tested as well, most of the scene graph operations, for instance, should be unit testable. Sure you could still end up with display bugs do to driver interaction, but you should still try to lock down what you can.

wnight · on Sept 14, 2011

I think you can use automated testing for games, usually... It requires a few compromises.

If you're feeding input into a non-deterministic game the character won't be exactly where they were last time, etc. You could keep track of the movement offset and see if it's within an acceptable range, or you could check the character's speed at two times and make sure they're accelerating properly, etc. Design a test level to highlight the potential problems (weird ground polys, whatever) and write an in-game script to test the behavior.

I think I could even script visual checks - mostly. You'd make, for example, a level prone to Z-buffering errors, where the colors were chosen to make it obvious - like a red wall showing through a white one. Save a stream of screenshots and compare them. Trivially, check for red. More complex, and perhaps better, check for high-contrast areas. It wouldn't tell you if the picture looked good overall but it'd be fairly good at finding instances of that one bug.

And for unit tests, don't be so quick to write such a good test that throwing it away is painful when you want to refactor the code.

estel · on Sept 13, 2011

Given that the op mentioned that he was an Android developer, it's worth noting that the Android SDK possibly has the very worst testing environment of any widely used system on the planet ever* .

* possibly an exaggeration, but it's definitely very very bad.

pnathan · on Sept 14, 2011

So I was working on developing a multicast serial driver. Since I was developing the serial driver, I had to take... over... communication... from the copied-from-ROM driver.

And since there was only 1 working communication port, how did I debug? How did I even run a test?

The board I was developing on was amazing. It had 1 (one) LED.

Blink.

Blink (fast).

JonnieCache · on Sept 14, 2011

You should have just taped a photoresistor over the LED and serialized your error codes back into your dev box with morse code.

But I guess then you'd have to debug that code and you'd be back where you started.

shabble · on Sept 14, 2011

This setup is almost exactly like the one in which I first learned C (it had a massive 2048 bytes of SRAM!).

As a bonus, it used a dodgy proprietary compiler with a whole sack of undocumented "technical limitations."

The good part is I now have a really good understanding of pointer manipulation. The bad part is using it sometimes makes me twitch.

Edit: Here's someone who's got us both beat: http://web.archive.org/web/20070613032334/http://ipodlinux.o...

atomicdog · on Sept 14, 2011

http://3.bp.blogspot.com/_AxcE8NTHeDU/TOW6kRd-FGI/AAAAAAAAAy...

mdwrigh2 · on Sept 14, 2011

What specifically would you like to see in terms of testing tools for Android? I ask this not because I don't think there are areas to improve (there are), but to get a better idea of where improvement is most needed.

chrisrhoden · on Sept 14, 2011

Java has good unit testing frameworks. Try Obj-C/Cocoa.

bazookaBen · on Sept 13, 2011

you're talking about Test-Driven-Development (TDD). Any good resources for integrating TDD into javascript based games? I'm still looking for that gem.

To expand on testing a bit: at one end, there's unit testing where one tests maybe a single object's behavior in the game. I think for game dev, testing for absolute values are not as important as deterministic ones. We would rather want to know if a player has jumped, died or shot something.

Eventually, a functional testing is needed when objects interact with one another. It's like a system of systems. That one's more complicated to design. It might go as far as designing a test that runs through a complete level within miliseconds. Anyone have ideas/resources on this?

chrisrhoden · on Sept 13, 2011

I am well aware of testing, TDD, BDD, and have been plugged into the scene for a while.

For the server and client both, I really like Jasmine as a testing framework. Of course, there's always cucumber :)

http://pivotal.github.com/jasmine/

0x12 · on Sept 13, 2011

> This forces you to ensure you have good test coverage (every feature is tested)

And it tests the tests too!

chime · on Sept 14, 2011

I do something similar with my web-app code. I keep an iPad open next to me when I code. Anytime I hit Cmd+B in my text-editor, it rsyncs the site to the server and some JS code running on the iPad automatically refreshes the page. It works quite well for creating HTML layout and updating CSS nearly-on-the-fly. If I have 4-5 browsers open on my Mac, they all instantly update too. So all I have to do is Cmd+Tab through them to verify it all looks good.

g0atbutt · on Sept 14, 2011

Can you go into more detail about how you set everything up? This sounds immensely useful.

chime · on Sept 14, 2011

Editor: BBEdit. Cmd+B is hooked to the following AppleScript for my project:

    tell application "BBEdit"
        save front document
    end tell
    
    tell application "Terminal"
        if (count of windows) is 0 then
            do script "/path.to.my/make.command"
        else
            do script "/path.to.my/make.command" in window 1
        end if
    end tell
    
    tell application "Google Chrome"
        activate
    end tell

My make.command (simplified):

    #!/bin/sh
    coffee -c mycode.coffee
    lessc -x mycode.less > mycode.css
    rsync -avz --delete --force --exclude ".DS*"
          -e "ssh my.ppk" /localpath/www user@me.com:/remotepath/www

On my server, I run a changed.php file:

    function fileHash($fn) {
        echo sprintf("%u.", crc32(file_get_contents(dirname(__FILE__) . $fn)));
    }

    header('Access-Control-Allow-Origin: *');
    fileHash('mycode.js');
    fileHash('mycode.css');

On all my HTML pages, I run this piece of CoffeeScript (compiled above to JS):

    # defined elsewhhere:
    #   debugmode (bool)
    #   reloadIfChangedLast (empty string)
    #   getCurrentView() returns #pageHashURL

    reloadIfChanged = ->
        if not debugmode
            return false
        $.ajax
            type: 'GET'
            url: 'me.com/changed.php?rnd=' + Math.random()
            error: ->
                setTimeout reloadIfChanged, 500
                return
            success: (data) ->
                if reloadIfChangedLast and data and
                   reloadIfChangedLast isnt data
                    reloadIfChangedLast = data
                    document.location.href = 'index.html?rnd=' +
                                             Math.random() +
                                             '#' + getCurrentView()
                else
                    if data
                        reloadIfChangedLast = data
                    setTimeout reloadIfChanged, 500
                return
        return

It may seem like a lot of hassle but for me it's just copy-paste and edit a few things per project. And the benefits are tremendous.

GeneralMaximus · on Sept 14, 2011

Brilliant. AppleScript is underrated as an automation language. Sure, it's a bit cumbersome to write, but nobody says you have to use it for everything. It plays well enough with shell scripts that you can leave the heavy lifting to another language and use AppleScript as a bridge between the command line and the UI. Me and a friend once wrote a bot in Python to control various music players from IRC. The bot uses AppleScript to talk to iTunes on OS X, and GObject introspection (or whatever they use on GNOME these days; I have little experience with desktop Linux) to talk to Rhythmbox. See http://github.com/GeneralMaximus/amazing-horse

All of Apple's official applications support scripting, so do most good third party apps. You don't need extra JS on your page to autorefresh Safari. This little snipped reloads the front-most Safari tab:

  tell application "Safari"
    set sameURL to URL of tab 1 of front window
    set URL of tab 1 of front window to sameURL
  end tell

You can even send Safari a snippet of JS to run. It's possible to automate any UI interaction by sending mouse clicks or keyboard events to tell application "System Events".

You can look at any app's AppleScript dictionary using the AppleScript Editor. Go File -> Open AppleScript Dictionary ... and then pick your app.

anthonyb · on Sept 14, 2011

I do a similar thing in bash for more normal development - basically a script to run my unit tests, look for OK/Error in the output and print either a couple of bars in green or the error output in red. It runs over and over again in a separate monitor, so as soon as you have wrong code, you know about it.

technomancy · on Sept 14, 2011

    He began by building the engine, and to do this he used 
    the ‘HotSwap’ functionality of the Java JVM 1.4.2, which 
    continuously updates the running code when it detects 
    that a class has changed.

    If anybody at Google is reading this, if you add this 
    feature to the Android emulator and I will literally 
    drive to your house and kiss you on the mouth.

Indeed, the fact that Android doesn't already have this makes it hard for me to take it seriously as a platform. Immediate feedback is crucial to any development environment.

aboodman · on Sept 14, 2011

Use an actual device, not the emulator. The emulator is super slow, but with an actual device you can have 30s iteration times.

Edit: I know 30s is still nowhere near ideal. Hell, on Linux Chromium (a massive project) we have faster iterations than that. But it's good compared to the Android emulator.

I think you should be able to go even faster, but I need to dig into how Ant actually works. It seems to be redoing more work than necessary for each build.

atomicdog · on Sept 14, 2011

Bit OT, but, what device would you recommend as a cheap 'testing' platform? I'm not looking for a new phone/tablet, just something for development.

aboodman · on Sept 14, 2011

Sorry, I don't know. I use a Nexus S as my primary phone so that's what I test on.

Markku · on Sept 14, 2011

I learned some things too. However, even if you have a tiny piece of software, like a game that you can reliably complete manually in 20 minutes, completely playing it through after every change is way too slow for "live coding". You should see your results instantly. Just like your test suite should run in seconds. If you have to wait minutes to compile, deploy and test your productivity will fall. A full integration/acceptance test suite can take longer, but it should be as automatic as possible and you can't use it for rapid feedback like this, but more for regression/acceptance testing.

Notch was able to take advantage of immediate feedback for the most part of his coding. The first hours spent on the rendering was tested by watching the world being rendered live. I've been doing the same developing a game in Clojure. When he got as far as the gameplay, he slowed down considerably, since he had to wander around in the game world to test each new feature. He for example made temporary shortcut passages, so he was able to test the boss monsters, without actually having to pass through the levels.

He is a person who can concentrate on delivering results and hack away at code for long stretches. This is the kind of code he presumably has done again and again for years. Who professional web developer can't hack together a small site as quickly? Where most people fail is attention span and drive! I know my unfinished projects speak of that :)

Most interesting part to me, of watching Notch code, was to see how he used his tools. And it was inspiring and motivating to see the progress. The actual code was very hackish but quality code is not important in a throw-away project anyway :)

wnight · on Sept 14, 2011

The way the article describes Notch's testing methods sounds like a glaring anti-pattern. This reminds me of my first QA job... Thankfully another post in here suggests that he actually takes pains to mold the level design, etc, to make the game more easily testable.

For those who didn't RTFA, though because this isn't Slashdot there shouldn't be any, Notch is reputed to have given some segment of his game a complete replay every time he made a change though it isn't mentioned exactly how often this is or how big the change is. A note is made that his build scripts make this almost instantaneous.

This is a problem because manual testing takes forever and causes tester fatigue. You stop doing a good job.

It can be harder to test behavior in a 3D game than a text filtering app but imho a good design is a testable one. (This does not necessarily mean I'm that good of a designer yet...)

Automate, automate, automate.

_riwy · on Sept 14, 2011

I'm convinced the moderator of this blog has actually fallen in love with Notch. They won't approve my comment because it's derogative to Notch's dev practises...

Markku · on Sept 14, 2011

Especially in this Ludum Dare case, development practices or implementation details do not matter as much as results. And he did get results. That's why it was interesting. I'd like to hear of what you think was wrong and why? This was his show of productivity and results. Other people have other ways and other circumstances can be completely different. And should be different.

_riwy · on Sept 14, 2011

"Right now every box in a model has to be individually animated to match the part it’s attached to, and there’s no grouping or hierarcy [sic]. " -- Notch