"Here's the most important bit: Notch's testing is mind-bogglingly thorough."
For the record, this is completely contrary to his testing method for Minecraft. Every time there is a new version of Minecraft, there are more bugs than features. This is even true when a "bugfix" release comes out. More bugs are introduced in a "bugfix" release than are fixed. Seriously, check it out: http://www.minecraftwiki.net/wiki/Version_history
The pre-1.8 release less than a week ago had an obvious and crippling bug. An experience system was added, and when you died you dropped your experience. But instead of doing so as a single object, it made thousands of objects each holding one single XP point. This made the game completely unplayable if you were in the same area, and even crippled multiplayer servers for people not in the area.
Also, while that is the most recent glaring bug, in the past there have been multiple occasions where a release broke trees so that the leaves didn't decay after the trunk was removed. Punching down a tree is something you do in the first 30 seconds of a normal minecraft game.
I've reflected on this in the past as well. It seems to me Notch doesn't do any kind of automated testing what so ever. Some of the bugs would have been easy to discover with simple unit tests. The game is great and all, but he does seem to have a shoot-from-the-hip approach to development.
It was a pre-release. Also: leaves not decaying was the intended behavior at the time, it wasn’t a bug.
But that’s not really the point. There is no way to test Minecraft thoroughly if you are pretty much on your own. It has too many features. Minecraft is too complex for this method to work.
I disagree with this. Using my bot framework I actually began writing a test suite. I stopped because it wouldn't actually have been very helpful to what I was doing due to various reasons, but I progressed enough to know it was very doable.
I think you're missing the primary benefit here, which is that he avoids regressions. By the time you are adding feature 256, you have 255 features which are going to possibly be affected by the code changes you introduce. This is a huge part of software engineering, and one that has gotten lots of attention over the years.
The process you describe for your development sounds extremely tedious, but probably won't break down until month 2 of development on a team of one. Once you reach a level of complexity beyond this, that's where automated testing proves it's salt.
The style of testing you're describing is commonly referred to as integration testing or acceptance testing, because it is designed to test the full stack in harmony. There are a number of great frameworks out there to help you do this. Cucumber is the one that's gotten the most love in the circles I am familiar with. You can write your steps in python or javascript, so don't worry that it's written in Ruby.
The typical thing to do once you've started doing automated testing is to actually write your tests first, watch them fail, then write the code to make the test pass. This forces you to ensure you have good test coverage (every feature is tested) and has been shown to result in better designed systems.
You have tons of reading to do if you want to learn more about this, but hit me up if you want a basic rundown.
I think games are hard (if not impossible) to test with unit tests or codified tests like the ones you mentioned.
At the company I work for we rely mostly on manual testing for testing the games and applications we make. We do some automated testing in the form of Tinytask tasks that are left on overnight to hammer an application. In terms of release testing, especially with a custom game engine, there's no easy way to codify a test which actually plays through a game where there is some random mechanic, or checks for things like the visual accuracy, windows flashing up on the screen when they're not supposed to, etc.
Web development frameworks like Selenium are great for UI testing, but they require identifying interface elements by their IDs, or performing a 100% image recognition match for targeting. If anyone knows of a good UI testing framework that would work for games/DirectX apps, I'd definitely love to hear about it.
I'm not bagging automated testing, I think it's a great way to save time and avoid the boring bits. I just don't think it's very relevant to game design, but for the OP's enterprise app job it would definitely be suitable.
I think the #1 reason why games don't need a lot of automated testing is serialization, or rather the common lack of it in most game worlds. If the data gets corrupt, you can reset the game and all is well again. When the rare corner cases pop up, it's easy to play whack-a-mole with them, because they're all "shallow" in the sense of the game starting from a clean reset frequently, so you can usually reproduce the bug quickly.
Of course, if the game does need heavy-duty serialization of everything, all bugs are potentially deadly. And this is largely the case for the line-of-business, social, and productivity apps, because that data is considered mission-critical. Corruption is not OK, reverting to backups is to be avoided. When a game has that kind of requirement, things get a lot tougher - and so games have naturally evolved to favor minimizing the save data to nothing or a few stats.
And this is borne out by looking for contrapositives. There are two genres that have a history of tending to be buggy because of some form of long-term data corruption: Large-scope RPGs and turn-based strategy games. It's hard to reach the bugs in those games, so it's also hard to fix them.
This is the difficulty for the company I work for. We make a range of desktop applications including launchers, productivity apps, media apps, games, etc.
Because all of these apps are based on a game engine, we get the complex and hard to test bugs that games get (texture cache corruption, null pointers in the scene tree, etc.). There's much more serialization and pressure than with pure games (but still not as much as with enterprise applications) because we're dealing with transactions with 3rd party services, or because a screw up could corrupt a user's whole music collection, or because users don't expect their music player to crash every 4 hours.
The game engine adds a huge amount of extra variability to our applications; we have to not only watch out for obscure bugs in our custom script code (which the engine silently tries/fails to execute anyway), but also in the (C/C++ based) game engine it is driving. The upside to using an engine with a simple scripting language is the shorter development times to get things off the ground and the high-performance/shiny visuals, but I feel it costs us much more down the line in terms of stability and extensibility.
I'm not sure I would be allowed/qualified to say much about our financials. From what I can gather we are profitable, with ongoing contracts for clients such as Dell.
At the risk of going too far off topic and without intending to be too harsh: at least in the video, that interface looks nice, but there is too much lag between when a tap/gesture is made and when the interface reflects it. This breaks the direct manipulation metaphor and is, I think, a showstopper. The pinch to zoom example was particularly off-putting: how can I know how much I'm zooming when the interface doesn't track my gesture? I cannot imagine being happy with pinching, waiting a second to see what's happened, then pinching and waiting to adjust (repeat until I get it right or am too frustrated).
I don't think there's too much danger in going off-topic if the conversation is interesting and has some substance.
Yea I think lag in touchscreen interaction is a huge problem. I think this is largely a result of hardware. We've had to ship on atom-based hardware with 2003-era DX9 graphics. Even the latest Intel chips with DX10 are horridly slow when it comes to graphics.
The other side is actual touchscreen hardware. The machine used in the video is an HP Touchsmart, which uses an optical touchscreen panel. These panels have a latency of around 100-200ms, and that's before we even start processing the touch event information. Capacitive touch sensors are much better, but the're expensive to manufacture above about 10 inches.
In the end lag/accuracy is a reality of the low-cost hardware OEMs use, and there will always be some trade-off between hardware cost and performance.
I am going to have to disagree with you on the unit test part. Everything in a game that is not graphics has no inherent limitation that prevents automated unit testing. A large part of your graphics pipe line can be unit tested as well, most of the scene graph operations, for instance, should be unit testable. Sure you could still end up with display bugs do to driver interaction, but you should still try to lock down what you can.
I think you can use automated testing for games, usually... It requires a few compromises.
If you're feeding input into a non-deterministic game the character won't be exactly where they were last time, etc. You could keep track of the movement offset and see if it's within an acceptable range, or you could check the character's speed at two times and make sure they're accelerating properly, etc. Design a test level to highlight the potential problems (weird ground polys, whatever) and write an in-game script to test the behavior.
I think I could even script visual checks - mostly. You'd make, for example, a level prone to Z-buffering errors, where the colors were chosen to make it obvious - like a red wall showing through a white one. Save a stream of screenshots and compare them. Trivially, check for red. More complex, and perhaps better, check for high-contrast areas. It wouldn't tell you if the picture looked good overall but it'd be fairly good at finding instances of that one bug.
And for unit tests, don't be so quick to write such a good test that throwing it away is painful when you want to refactor the code.
Given that the op mentioned that he was an Android developer, it's worth noting that the Android SDK possibly has the very worst testing environment of any widely used system on the planet ever* .
* possibly an exaggeration, but it's definitely very very bad.
So I was working on developing a multicast serial driver. Since I was developing the serial driver, I had to take... over... communication... from the copied-from-ROM driver.
And since there was only 1 working communication port, how did I debug? How did I even run a test?
The board I was developing on was amazing. It had 1 (one) LED.
What specifically would you like to see in terms of testing tools for Android? I ask this not because I don't think there are areas to improve (there are), but to get a better idea of where improvement is most needed.
you're talking about Test-Driven-Development (TDD). Any good resources for integrating TDD into javascript based games? I'm still looking for that gem.
To expand on testing a bit: at one end, there's unit testing where one tests maybe a single object's behavior in the game. I think for game dev, testing for absolute values are not as important as deterministic ones. We would rather want to know if a player has jumped, died or shot something.
Eventually, a functional testing is needed when objects interact with one another. It's like a system of systems.
That one's more complicated to design. It might go as far as designing a test that runs through a complete level within miliseconds. Anyone have ideas/resources on this?
I do something similar with my web-app code. I keep an iPad open next to me when I code. Anytime I hit Cmd+B in my text-editor, it rsyncs the site to the server and some JS code running on the iPad automatically refreshes the page. It works quite well for creating HTML layout and updating CSS nearly-on-the-fly. If I have 4-5 browsers open on my Mac, they all instantly update too. So all I have to do is Cmd+Tab through them to verify it all looks good.
Editor: BBEdit. Cmd+B is hooked to the following AppleScript for my project:
tell application "BBEdit"
save front document
end tell
tell application "Terminal"
if (count of windows) is 0 then
do script "/path.to.my/make.command"
else
do script "/path.to.my/make.command" in window 1
end if
end tell
tell application "Google Chrome"
activate
end tell
Brilliant. AppleScript is underrated as an automation language. Sure, it's a bit cumbersome to write, but nobody says you have to use it for everything. It plays well enough with shell scripts that you can leave the heavy lifting to another language and use AppleScript as a bridge between the command line and the UI. Me and a friend once wrote a bot in Python to control various music players from IRC. The bot uses AppleScript to talk to iTunes on OS X, and GObject introspection (or whatever they use on GNOME these days; I have little experience with desktop Linux) to talk to Rhythmbox. See http://github.com/GeneralMaximus/amazing-horse
All of Apple's official applications support scripting, so do most good third party apps. You don't need extra JS on your page to autorefresh Safari. This little snipped reloads the front-most Safari tab:
tell application "Safari"
set sameURL to URL of tab 1 of front window
set URL of tab 1 of front window to sameURL
end tell
You can even send Safari a snippet of JS to run. It's possible to automate any UI interaction by sending mouse clicks or keyboard events to tell application "System Events".
You can look at any app's AppleScript dictionary using the AppleScript Editor. Go File -> Open AppleScript Dictionary ... and then pick your app.
I do a similar thing in bash for more normal development - basically a script to run my unit tests, look for OK/Error in the output and print either a couple of bars in green or the error output in red. It runs over and over again in a separate monitor, so as soon as you have wrong code, you know about it.
He began by building the engine, and to do this he used
the ‘HotSwap’ functionality of the Java JVM 1.4.2, which
continuously updates the running code when it detects
that a class has changed.
If anybody at Google is reading this, if you add this
feature to the Android emulator and I will literally
drive to your house and kiss you on the mouth.
Indeed, the fact that Android doesn't already have this makes it hard for me to take it seriously as a platform. Immediate feedback is crucial to any development environment.
Use an actual device, not the emulator. The emulator is super slow, but with an actual device you can have 30s iteration times.
Edit: I know 30s is still nowhere near ideal. Hell, on Linux Chromium (a massive project) we have faster iterations than that. But it's good compared to the Android emulator.
I think you should be able to go even faster, but I need to dig into how Ant actually works. It seems to be redoing more work than necessary for each build.
I learned some things too. However, even if you have a tiny piece of software, like a game that you can reliably complete manually in 20 minutes, completely playing it through after every change is way too slow for "live coding". You should see your results instantly. Just like your test suite should run in seconds. If you have to wait minutes to compile, deploy and test your productivity will fall. A full integration/acceptance test suite can take longer, but it should be as automatic as possible and you can't use it for rapid feedback like this, but more for regression/acceptance testing.
Notch was able to take advantage of immediate feedback for the most part of his coding. The first hours spent on the rendering was tested by watching the world being rendered live. I've been doing the same developing a game in Clojure. When he got as far as the gameplay, he slowed down considerably, since he had to wander around in the game world to test each new feature. He for example made temporary shortcut passages, so he was able to test the boss monsters, without actually having to pass through the levels.
He is a person who can concentrate on delivering results and hack away at code for long stretches. This is the kind of code he presumably has done again and again for years. Who professional web developer can't hack together a small site as quickly? Where most people fail is attention span and drive! I know my unfinished projects speak of that :)
Most interesting part to me, of watching Notch code, was to see how he used his tools. And it was inspiring and motivating to see the progress. The actual code was very hackish but quality code is not important in a throw-away project anyway :)
The way the article describes Notch's testing methods sounds like a glaring anti-pattern. This reminds me of my first QA job... Thankfully another post in here suggests that he actually takes pains to mold the level design, etc, to make the game more easily testable.
For those who didn't RTFA, though because this isn't Slashdot there shouldn't be any, Notch is reputed to have given some segment of his game a complete replay every time he made a change though it isn't mentioned exactly how often this is or how big the change is. A note is made that his build scripts make this almost instantaneous.
This is a problem because manual testing takes forever and causes tester fatigue. You stop doing a good job.
It can be harder to test behavior in a 3D game than a text filtering app but imho a good design is a testable one. (This does not necessarily mean I'm that good of a designer yet...)
I'm convinced the moderator of this blog has actually fallen in love with Notch. They won't approve my comment because it's derogative to Notch's dev practises...
Especially in this Ludum Dare case, development practices or implementation details do not matter as much as results. And he did get results. That's why it was interesting. I'd like to hear of what you think was wrong and why? This was his show of productivity and results. Other people have other ways and other circumstances can be completely different. And should be different.
"Right now every box in a model has to be individually animated to match the part it’s attached to, and there’s no grouping or hierarcy [sic]. " -- Notch
For the record, this is completely contrary to his testing method for Minecraft. Every time there is a new version of Minecraft, there are more bugs than features. This is even true when a "bugfix" release comes out. More bugs are introduced in a "bugfix" release than are fixed. Seriously, check it out: http://www.minecraftwiki.net/wiki/Version_history
I say this as the owner of a library to provide bot API to Minecraft. https://github.com/superjoe30/mineflayer
All I'm saying is... "thoroughness" and "testing" are not two words I expected to be in the same sentence as Notch's methodology.