XML data for parsers

Yomi
Dungeon Master

Posts: 1,666

XML data for parsers Aug 10, 2009 21:04:37 GMT dopplegang likes this

Quote

Post by Yomi on Aug 10, 2009 21:04:37 GMT

Since I have my own logger, I've often thought it'd be nice having a community resource that multiple parsers could use.

Version 0.23, 10 Apr 2010: hgdata.xml. Old versions: 0.22, 0.21, 0.20, 0.10.

Here's an example of some ways to parse it in Perl: parse_hgxml.txt. It's a harness for both a regex parser and XML::LibXML and also includes an example of XML::Simple. If you're using Perl I'd recommend either the rexex parser or XML::LibXML. With other languages there are certainly lots of options, but anything supporting XPATH should make for easy parsing (e.g. REXML in Ruby, xml.xpath in Python, xpath.ahk for AHK, LibXML2 for C, javax.xml.xpath in Java, etc.).

It has things like:

- map names for each area, useful for loggers. My logger automatically figures out the run you're on by looking at the maps you enter.

- list of things which should be ignored (walls, doors, muscle mass, psi-orbs, etc.). This is used by loggers to keep the mage from getting 10k extra damage in logs by blowing up the Tia pillar of skulls, for instance. These are done both globally and specific to area.

- list of common summons (Balors, Lammasus, etc.)

- list of maps which should not be counted in log. This can be used to keep damage/kills from accumulating if a DM is toying with the party in the Workshop, Wyrm, etc.

- level range. This can be used by loggers to remove bits of low level trash before or after a high level run (e.g. a few Drey before a Hells run, or Half-Orcs before Elysium)

- list of areas. For each area we have the aforementioned map names, ignore list, level range, but also a list of mobs encountered in the area. Each mob can have attributes of:
- race (e.g. Outsider, Gnome, Fey, Reptilian)
- type (miniboss, boss)
- quality (simple quality metric)
- heals (for mobs that heal on a damage type)
- paragon level (default is not a paragon)
- kickback type
- cr (challenge rating)
- synonym for mobs with multiple names
- noxp to indicate mobs which are expected to give no xp

The current version is pretty complete. Low level mobs aren't all entered, and the Abyss data will get some changes as more runs are done. I'm interested in feedback on both the data quality as well as (1) the format especially since there are many ways to represent stuff in XML as well as YAML and JSON, and (2) what other information would be useful.

I decided to take a simple approach with the mobs in terms of paragons, rather than structuring them (i.e. PF turns into Aspirant at para 1, then gets Superior added for para 2 or Elite for para 3). Some more thought needs to go into good ways to describe how the minibosses turn into paragons based on demicount.

More detailed timing info for parsing the XML. The file is only parsed once upon starting up a logger so it is somewhat irrelevant, but I'm a performance geek so I can't help it. All run on Intel Core2 quad-core machines. Perl version 5.10 was used on all machines.

Machine	Regex	XML::LibXML
Windows Cygwin	31ms	117ms
Windows Strawberry Perl	31ms	57ms
Fedora 11 Perl	28ms	57ms

Fedora 12 on an i7-920 runs this just a little faster than the Q6600 running Fedora 11. ActiveState Perl 5.8.8 on the Windows machine runs at the same speed as Cygwin 5.8.8 on that machine. There wasn't much difference between 5.8.8 and 5.10. Cygwin and Activestate Perl on Windows XP have some odd performance hiccups. My code runs the same speed, but some libraries run strangely slow. LibXML runs about 2x slower and XML::Simple runs about 5x slower. Strawberry Perl on XP has none of these issues. They all, however, produce identical output (it'd be pretty scary if they didn't).

I did have some timing using XML::Simple, but XML::LibXML is much nicer and faster as well. The regex version runs reasonably fast on all platforms. Both the regex and LibXML code fill in identical associative arrays with all the data (LibXML using XPATH).

I'm using the regex parser in my logger as it's fast, it works everywhere without having to worry about installing modules. LibXML is pretty spiffy though and the performance is good.

Last Edit: Apr 10, 2010 23:31:19 GMT by Yomi

Real Time Current Run Log
Run Log archive

Yomi
Dungeon Master

Posts: 1,666

XML data for parsers Aug 10, 2009 21:34:39 GMT

Quote

Post by Yomi on Aug 10, 2009 21:34:39 GMT

I chose XML since people seem comfortable with it. YAML and JSON are a couple other alternatives. Both of those are a little easier to read though the XML seems pretty straightforward. All of them are pretty simple to parse and language-neutral.

Real Time Current Run Log
Run Log archive

TJ
Moderator

Posts: 1,059

XML data for parsers Aug 10, 2009 21:47:42 GMT

Quote

Post by TJ on Aug 10, 2009 21:47:42 GMT

okay, so i know this isnt quite the place for this, but given the fact that Yomi's XML file contains the damage type and % mobs heal by if they do, is tehre a way for someone to add a function to their program to hover this over NWN when one of these mobs is spawned?

ex.
Superior Barbazu Razor is spawned, and when it appears in the combat log, the parser outputs "Acid, 600%" or w.e it heals to hover over NWN in the top left or something.

.02 cents deposited,

TJ

Guild Supremacy

My Toons

Yomi
Dungeon Master

Posts: 1,666

XML data for parsers Aug 10, 2009 22:11:15 GMT

Quote

Post by Yomi on Aug 10, 2009 22:11:15 GMT

I was planning on adding it to mine, though until I make some changes to the infrastructure there is a delay (mine is more an analyzer that I've abused into a logger using a web page). But if you have a real time one with a window, then certainly you could use the data to immediately tell you what the healing element is for any mob it sees, and also point out who the offending people are who have healed in it the last 2 rounds.

Last Edit: Aug 10, 2009 22:11:39 GMT by Yomi

Real Time Current Run Log
Run Log archive

TJ Moderator Posts: 1,059	XML data for parsers Aug 10, 2009 22:15:36 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by TJ on Aug 10, 2009 22:15:36 GMT hmm okay. i would try to add it myself, but i know nothing about python or any of the languages you guys use.... im going to be learning Java, and i know some HTML, but beyond that code just goes right over my head
	Guild Supremacy My Toons

Yomi Dungeon Master Posts: 1,666	XML data for parsers Aug 12, 2009 16:34:23 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Yomi on Aug 12, 2009 16:34:23 GMT Updated to second draft. Simple parser included.
	Real Time Current Run Log Run Log archive

dirk Veteran Posts: 137	XML data for parsers Aug 13, 2009 12:45:35 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by dirk on Aug 13, 2009 12:45:35 GMT Your link in the first post is missing the "l" in .xml. -db
	Author of HG Webdash hgdash.randomsuspect.com/

Yomi Dungeon Master Posts: 1,666	XML data for parsers Aug 13, 2009 13:33:04 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Yomi on Aug 13, 2009 13:33:04 GMT Fixed, thanks.
	Real Time Current Run Log Run Log archive

Yomi
Dungeon Master

Posts: 1,666

XML data for parsers Aug 13, 2009 15:28:49 GMT

Quote

Post by Yomi on Aug 13, 2009 15:28:49 GMT

I just modified my parser to use the XML data -- removing some of the hard coded things. It was pretty easy adding the healing and swing quality metrics to the output. I did my best to verify the hells mob data matched what I already had. I have some more verification work I can do on the other run data.

I can see one minor schema issue, not sure if it is worth fixing. I'm using an attribute rather than an element for synonyms (typically corrected typos), which means only one is allowed. Honestly most people wouldn't care about synonyms in the first place -- I occasionally want to parse logs from early 2007 to present, which means I need to be able to handle anything in the logs. For most people running a logger they'd just replace the old name with the new one and never notice. Also it'd be pretty easy just to make a separate mob entry for the old name if needed.

Real Time Current Run Log
Run Log archive

separ Journeyman Posts: 79	XML data for parsers Sept 9, 2009 6:55:35 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by separ on Sept 9, 2009 6:55:35 GMT Data looks very nice, but I can't download the parse_hgxml.pl parser script ... server error.

Yomi Dungeon Master Posts: 1,666	XML data for parsers Sept 9, 2009 13:20:29 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Yomi on Sept 9, 2009 13:20:29 GMT Changed to .txt. I've been using the data for my logger using the trivial parser and it's been working well so far.
	Real Time Current Run Log Run Log archive

separ
Journeyman

Posts: 79

XML data for parsers Sept 10, 2009 8:29:19 GMT

Quote

Post by separ on Sept 10, 2009 8:29:19 GMT

Thanks a lot.
I'm about to integrate your data with the yal logger.

Just some more questions ...

Any special reason you specified encoding="ISO-8859-1" instead of utf-8 ?

What's the 'qual' property for ?

Where do the different boss types come from and what do they mean?

Are you still updating the file with new data? Looking at the hells bestiary in the wiki, some mobs seem to heal from exotic damage which is not yet in the file. And potentially other interesting properties like saves, spell immunities, ...

many thanks for your work

Yomi
Dungeon Master

Posts: 1,666

XML data for parsers Sept 10, 2009 15:56:03 GMT

Quote

Post by Yomi on Sept 10, 2009 15:56:03 GMT

Great on the YAL incorporation -- I can do it if you'd prefer, it's been on my todo list for a while. I do wish YAL would use the !echo commands instead of whisper.

No special reason on the encoding. Most I've seen are ISO-8859-1. I would think for what we're encoding we wouldn't need anything more than ISO-8859-1, but it doesn't really matter which we use.

qual is used to roughly indicate the quality of the mob, with anything not having a qual field assumed to be 1.0. The theory is that we can encourage more than just raw damage output by giving people another metric. Hitting PFs raises your swing quality more than hitting lemures.

As for the type field, it's definitely something in progress. I'm not currently using it so haven't been in any rush to correct the data. For the Hells I was using "Boss #" to mean the boss of some level, and "Miniboss #" to mean a miniboss on that level. Alternately the Hells could be broken out into separate areas, though that would mean a lot of mob duplication especially with randoms. It would solve some other issues such as the map names.

I updated the file a week ago, though it was mainly with race info. I added healing info for a couple non-hells mobs, but I thought the hells mobs were complete (barring the minibosses, especially with their healing). Brachina's are listed as healing from Pos, Magebanes from Magic, Excruciarch's from Neg, Infestiarch's from Neg, Narzugons from Pos, etc.

The Holla score could be computed using the kb field, though with so many people having tear stars, machine spears, and the like I'm not sure it's as useful as it used to be. I'm hoping that field can be used instead of making a specific "nohit" one, but let me know.

It's definitely a file that would work well with a version management system. I think we have subversion running on hgweb.

Real Time Current Run Log
Run Log archive

separ
Journeyman

Posts: 79

XML data for parsers Sept 10, 2009 17:53:40 GMT

Quote

Post by separ on Sept 10, 2009 17:53:40 GMT

Sept 10, 2009 15:56:03 GMT Yomi said:

Great on the YAL incorporation -- I can do it if you'd prefer, it's been on my todo list for a while. I do wish YAL would use the !echo commands instead of whisper.

Probably a good idea, but so far I've only made visual changes and haven't touched any of the game- and control-logic. But for 2 exceptions:
- added the .who command and shrubbed automatic party recognition logic (already posted in yal-thread)
- added data-import from you xml-file (not yet posted)

No special reason on the encoding. Most I've seen are ISO-8859-1. I would think for what we're encoding we wouldn't need anything more than ISO-8859-1, but it doesn't really matter which we use.

It should affect anything with the current data, but nowadays everything (especially on the linux/unix/web-side) seems to be defaulting to utf-8. Maybe the xml-parser runs conversion-routines anyway "just to be save" and takes more time because of that?

qual is used to roughly indicate the quality of the mob, with anything not having a qual field assumed to be 1.0. The theory is that we can encourage more than just raw damage output by giving people another metric. Hitting PFs raises your swing quality more than hitting lemures.

As for the type field, it's definitely something in progress. I'm not currently using it so haven't been in any rush to correct the data. For the Hells I was using "Boss #" to mean the boss of some level, and "Miniboss #" to mean a miniboss on that level. Alternately the Hells could be broken out into separate areas, though that would mean a lot of mob duplication especially with randoms. It would solve some other issues such as the map names.

I updated the file a week ago, though it was mainly with race info. I added healing info for a couple non-hells mobs, but I thought the hells mobs were complete (barring the minibosses, especially with their healing). Brachina's are listed as healing from Pos, Magebanes from Magic, Excruciarch's from Neg, Infestiarch's from Neg, Narzugons from Pos, etc.

The Holla score could be computed using the kb field, though with so many people having tear stars, machine spears, and the like I'm not sure it's as useful as it used to be. I'm hoping that field can be used instead of making a specific "nohit" one, but let me know.

As said above, I only use the additional data for display, and making all the changes here that I want have kept me so far from posting an updated version. Will do so soonish ...
For me it's quite helpful when looking at the realtime info to see the most important data of the mobs directly colorcoded beside the mobname. The vets here probably know all that stuff already, but I'm still a rookie

It's definitely a file that would work well with a version management system. I think we have subversion running on hgweb.

hgweb-account from the wiki? or something else?
Anyway, I'm not a big fan of svn, locally I manage all my changes with git.

Edit: I forgot to say ... maybe we should add a link to this thread in the bestiary-pages in the wiki ...

Last Edit: Sept 10, 2009 17:56:18 GMT by separ

separ
Journeyman

Posts: 79

XML data for parsers Sept 10, 2009 21:15:38 GMT

Quote

Post by separ on Sept 10, 2009 21:15:38 GMT

Sept 10, 2009 15:56:03 GMT Yomi said:

The Holla score could be computed using the kb field, though with so many people having tear stars, machine spears, and the like I'm not sure it's as useful as it used to be. I'm hoping that field can be used instead of making a specific "nohit" one, but let me know.

Just noticed on the min run: the Tears should have a nohit-flag and they don't have a kb-flag.

Post by Yomi on Aug 10, 2009 21:04:37 GMT

Post by Yomi on Aug 10, 2009 21:34:39 GMT

Post by TJ on Aug 10, 2009 21:47:42 GMT

Post by Yomi on Aug 10, 2009 22:11:15 GMT

Post by TJ on Aug 10, 2009 22:15:36 GMT

Post by Yomi on Aug 12, 2009 16:34:23 GMT

Post by dirk on Aug 13, 2009 12:45:35 GMT

Post by Yomi on Aug 13, 2009 13:33:04 GMT

Post by Yomi on Aug 13, 2009 15:28:49 GMT

Post by separ on Sept 9, 2009 6:55:35 GMT

Post by Yomi on Sept 9, 2009 13:20:29 GMT

Post by separ on Sept 10, 2009 8:29:19 GMT

Post by Yomi on Sept 10, 2009 15:56:03 GMT

Post by separ on Sept 10, 2009 17:53:40 GMT

Post by separ on Sept 10, 2009 21:15:38 GMT