Post by Yomi on Aug 10, 2009 21:04:37 GMT
Since I have my own logger, I've often thought it'd be nice having a community resource that multiple parsers could use.
Version 0.23, 10 Apr 2010: hgdata.xml. Old versions: 0.22, 0.21, 0.20, 0.10.
Here's an example of some ways to parse it in Perl: parse_hgxml.txt. It's a harness for both a regex parser and XML::LibXML and also includes an example of XML::Simple. If you're using Perl I'd recommend either the rexex parser or XML::LibXML. With other languages there are certainly lots of options, but anything supporting XPATH should make for easy parsing (e.g. REXML in Ruby, xml.xpath in Python, xpath.ahk for AHK, LibXML2 for C, javax.xml.xpath in Java, etc.).
It has things like:
- map names for each area, useful for loggers. My logger automatically figures out the run you're on by looking at the maps you enter.
- list of things which should be ignored (walls, doors, muscle mass, psi-orbs, etc.). This is used by loggers to keep the mage from getting 10k extra damage in logs by blowing up the Tia pillar of skulls, for instance. These are done both globally and specific to area.
- list of common summons (Balors, Lammasus, etc.)
- list of maps which should not be counted in log. This can be used to keep damage/kills from accumulating if a DM is toying with the party in the Workshop, Wyrm, etc.
- level range. This can be used by loggers to remove bits of low level trash before or after a high level run (e.g. a few Drey before a Hells run, or Half-Orcs before Elysium)
- list of areas. For each area we have the aforementioned map names, ignore list, level range, but also a list of mobs encountered in the area. Each mob can have attributes of:
- race (e.g. Outsider, Gnome, Fey, Reptilian)
- type (miniboss, boss)
- quality (simple quality metric)
- heals (for mobs that heal on a damage type)
- paragon level (default is not a paragon)
- kickback type
- cr (challenge rating)
- synonym for mobs with multiple names
- noxp to indicate mobs which are expected to give no xp
The current version is pretty complete. Low level mobs aren't all entered, and the Abyss data will get some changes as more runs are done. I'm interested in feedback on both the data quality as well as (1) the format especially since there are many ways to represent stuff in XML as well as YAML and JSON, and (2) what other information would be useful.
I decided to take a simple approach with the mobs in terms of paragons, rather than structuring them (i.e. PF turns into Aspirant at para 1, then gets Superior added for para 2 or Elite for para 3). Some more thought needs to go into good ways to describe how the minibosses turn into paragons based on demicount.
More detailed timing info for parsing the XML. The file is only parsed once upon starting up a logger so it is somewhat irrelevant, but I'm a performance geek so I can't help it. All run on Intel Core2 quad-core machines. Perl version 5.10 was used on all machines.
Fedora 12 on an i7-920 runs this just a little faster than the Q6600 running Fedora 11. ActiveState Perl 5.8.8 on the Windows machine runs at the same speed as Cygwin 5.8.8 on that machine. There wasn't much difference between 5.8.8 and 5.10. Cygwin and Activestate Perl on Windows XP have some odd performance hiccups. My code runs the same speed, but some libraries run strangely slow. LibXML runs about 2x slower and XML::Simple runs about 5x slower. Strawberry Perl on XP has none of these issues. They all, however, produce identical output (it'd be pretty scary if they didn't).
I did have some timing using XML::Simple, but XML::LibXML is much nicer and faster as well. The regex version runs reasonably fast on all platforms. Both the regex and LibXML code fill in identical associative arrays with all the data (LibXML using XPATH).
I'm using the regex parser in my logger as it's fast, it works everywhere without having to worry about installing modules. LibXML is pretty spiffy though and the performance is good.
Version 0.23, 10 Apr 2010: hgdata.xml. Old versions: 0.22, 0.21, 0.20, 0.10.
Here's an example of some ways to parse it in Perl: parse_hgxml.txt. It's a harness for both a regex parser and XML::LibXML and also includes an example of XML::Simple. If you're using Perl I'd recommend either the rexex parser or XML::LibXML. With other languages there are certainly lots of options, but anything supporting XPATH should make for easy parsing (e.g. REXML in Ruby, xml.xpath in Python, xpath.ahk for AHK, LibXML2 for C, javax.xml.xpath in Java, etc.).
It has things like:
- map names for each area, useful for loggers. My logger automatically figures out the run you're on by looking at the maps you enter.
- list of things which should be ignored (walls, doors, muscle mass, psi-orbs, etc.). This is used by loggers to keep the mage from getting 10k extra damage in logs by blowing up the Tia pillar of skulls, for instance. These are done both globally and specific to area.
- list of common summons (Balors, Lammasus, etc.)
- list of maps which should not be counted in log. This can be used to keep damage/kills from accumulating if a DM is toying with the party in the Workshop, Wyrm, etc.
- level range. This can be used by loggers to remove bits of low level trash before or after a high level run (e.g. a few Drey before a Hells run, or Half-Orcs before Elysium)
- list of areas. For each area we have the aforementioned map names, ignore list, level range, but also a list of mobs encountered in the area. Each mob can have attributes of:
- race (e.g. Outsider, Gnome, Fey, Reptilian)
- type (miniboss, boss)
- quality (simple quality metric)
- heals (for mobs that heal on a damage type)
- paragon level (default is not a paragon)
- kickback type
- cr (challenge rating)
- synonym for mobs with multiple names
- noxp to indicate mobs which are expected to give no xp
The current version is pretty complete. Low level mobs aren't all entered, and the Abyss data will get some changes as more runs are done. I'm interested in feedback on both the data quality as well as (1) the format especially since there are many ways to represent stuff in XML as well as YAML and JSON, and (2) what other information would be useful.
I decided to take a simple approach with the mobs in terms of paragons, rather than structuring them (i.e. PF turns into Aspirant at para 1, then gets Superior added for para 2 or Elite for para 3). Some more thought needs to go into good ways to describe how the minibosses turn into paragons based on demicount.
More detailed timing info for parsing the XML. The file is only parsed once upon starting up a logger so it is somewhat irrelevant, but I'm a performance geek so I can't help it. All run on Intel Core2 quad-core machines. Perl version 5.10 was used on all machines.
Machine | Regex | XML::LibXML |
Windows Cygwin | 31ms | 117ms |
Windows Strawberry Perl | 31ms | 57ms |
Fedora 11 Perl | 28ms | 57ms |
Fedora 12 on an i7-920 runs this just a little faster than the Q6600 running Fedora 11. ActiveState Perl 5.8.8 on the Windows machine runs at the same speed as Cygwin 5.8.8 on that machine. There wasn't much difference between 5.8.8 and 5.10. Cygwin and Activestate Perl on Windows XP have some odd performance hiccups. My code runs the same speed, but some libraries run strangely slow. LibXML runs about 2x slower and XML::Simple runs about 5x slower. Strawberry Perl on XP has none of these issues. They all, however, produce identical output (it'd be pretty scary if they didn't).
I did have some timing using XML::Simple, but XML::LibXML is much nicer and faster as well. The regex version runs reasonably fast on all platforms. Both the regex and LibXML code fill in identical associative arrays with all the data (LibXML using XPATH).
I'm using the regex parser in my logger as it's fast, it works everywhere without having to worry about installing modules. LibXML is pretty spiffy though and the performance is good.