… or, “How to Use Publicly Available Data to Justify your Balance Complaints on the Boards.”
I get a lot of hits from people searching for data mining World of Warcraft. I’m going to tell you how to do it.
By “data mining,” I mean gathering data on content, systems, and player behavior and sifting through it for patterns.
Blizzard says that “data mining” is against the rules, but they’re talking about something else.
Manipulation of the data stream involves a player altering the flow of information between the World of Warcraft servers and his/her computer while data mining involves a player extracting information from said information that is not intended to be public.
Thottbot and variants have been around forever and nobody’s done anything about it, so I figure they’re fair game. Quest mob locations and such are public data, right? That brings us right into the first way to data mine WoW.
Option A: Go to the Source
Write a plugin that tracks players and sends the results to you. There are lots of them; Thottbot is the first and most famous. Here’s a diagram of how Thottbot works. The others are similar.
Writing the plugin itself shouldn’t be hard. You just write a Lua script that says “every time this event happens, save it out to here.” You can track everything they make available. I can see the black helicopters from here!
Unfortunately, you have to save the events out to a Lua saved variables file — WoW won’t let you do real file IO. This saved variables to XML script looks pretty handy, though. Don’t think you can take the easy way out and copy somebody else’s plugin — Thottbot’s code and saved variables are obfuscated all to hell. I haven’t checked the others.
Alternatively, you can write your own memory app and save the data however you’d like. I’m not entirely sure why this is legal. It also requires much more technical chops. Here’s an interview with the guy who made Curse’s — he was attempting to do a “Thottbot for Vanguard,” while Vanguard doesn’t have WoW’s UI scripting functionality. Interestingly, the original interview is offline …
Getting people to use your plugin is hard. Thottbot is part of Cosmos, which was very popular back around launch, so it got installed on lots of people’s machines without the people knowing anything about it. It helps if you have a fancy, helpful website, where people are encouraged to use your plugin because they like your site and want to contribute.
Some of them provide one bit of useful functionality in game — they allow the player to see their in-game coordinates, which you can’t see in the default UI. Since the website gives locations in those coordinates, you need to have the plugin installed to see where that pesky quest mob is at.
That’s the other hard part — getting people to upload their data. Cosmos has its own patcher, which, as it’s patching your plugins, also sends your data back home to Thottbot. Other data mining plugins have their own send-us-your-juicy-data-in-guise-of-”patching” “patcher.” Some data mining plugin websites also allow players to upload their own saved variables files, which requires even more time on their part, and sounds like an even worse way to do it.
The bright side of all this work is that, again, you can track whatever you want. You can track talent spec popularity. You can track individual spell use frequency. You can track cyber. Wait, I hear the black helicopters! If you have one of those plugins installed, they really are watching you, and since their code is obfuscated, you can’t even see exactly what they’re doing. (If you want to be even more scared, remember that Thottbot and Allakhazam are owned by IGE.)
Eavesdropping is fun! But the downside is, again, that it’s a lot of work. Additionally, your results will always be skewed to the power users who install these kinds of things. If you’re just looking for the location of quest mobs, it doesn’t matter. But if you’re trying to find answers to balance questions and such, your results won’t be truly accurate. In WoW, only Blizzard gets to see the full data set.
Option B: Bot It
Write a bot. PlayOn gets really cool stuff out of their bot. PlayOn is run by a bunch of PhDs who are way smarter than any of us. You’re limited to information available on the /who list, and I’m not sure that you can get much out of it that they haven’t already done.
Option C: Steal It
Data mine the data miners. All those data mining plugins have fancy web front ends to their databases. (*cough*)
They might not be showing you everything — if they are tracking talent spec popularity and how often players cyber and stuff like that, they’re not putting it up on their site. But if you just want to answer content distribution or character advancement questions, it’s all right there. I don’t know how useful this stuff is to people who aren’t reverse-engineering rival devs, but you might like to know that, say, your class pays way more in ability training costs over its lifetime than any other. Take that to the boards!
Profile sites like CTProfiles allow you to gather some character distribution data, but it’s a skewed data set and your results won’t accurately reflect the playerbase.
Taking These Options to Other MMOs
Options A and B: Without a fancy event-driven scripting system, your options are limited. See, again, that Curse Gaming system. And again, I’m not really sure why it’s legal. In any case, you need to get lots of people to install it; Curse’s Vanguard database says it only has 338 contributors as of this morning. If you’re just trying to fill out an item and quest database, you only need to get it in the hands of the explorers who are going to see all that stuff. If you’re trying to analyze behavior, you need a better sample.
Option C: We have a winner! Other games have websites with interesting data. (*cough*)
If you want to answer balance questions with distribution data, Dark Age of Camelot has those fancy XML files. EQ2 and Vanguard have those fancy character databases. You get the full data set! I remember DAoC players using it to justify balance complaints on the boards; I haven’t seen that for EQ2 and Vanguard. Then again, I haven’t been looking. It would require a lot more skills — for DAoC, you can directly download the files, but for Sony’s games, you need to script.
Hope this helps, anonymous Google searchers.