[Ohrrpgce] SVN: pkmnfrk/2428 Breaking the ground on my XML idea. for the time being, I'm not hacking

David Gowers 00ai99 at gmail.com
Wed Nov 5 17:09:49 PST 2008


I should get around to unsubscribing sometime :)

On Thu, Nov 6, 2008 at 5:51 AM, James Paige <Bob at hamsterrepublic.com> wrote:
> On Wed, Nov 05, 2008 at 01:41:23PM -0500, Mike Caron wrote:
>> James Paige wrote:
>> >On Wed, Nov 05, 2008 at 12:03:28AM -0800, subversion at HamsterRepublic.com
>> >wrote:
>> >>pkmnfrk
>> >>2008-11-05 00:03:27 -0800 (Wed, 05 Nov 2008)
>> >>467
>> >>Breaking the ground on my XML idea. for the time being, I'm not hacking
>> >>the OHR directly. Instead, I'm going to work on a separate "test" program.
>> >>
>> >>The main reason is because using libxml adds three DLLs as dependencies,
>> >>and I'm going to try and make them all static libraries, or at least roll
>> >>them into one DLL.
>> >>
>> >>I assume this is not such a big deal on Linux, since Linux users are used
>> >>to having to apt-get a bunch of stuff before trying new things out,
>> >>right? :P
>> >
>> >I was reading up on the differences between XML, YAML and JSON. I really
>> >liked the minimalism of those other formats. XML really is astonishingly
>> >bulky.
>>
>> Truthfully, I'd never heard of YAML until this very moment. However, I
>> glossed over the spec to get an idea of it, and here are my thoughts,
>> and why I like XML.
>>
>> YAML: It uses indentation for block levels. As I'm sure I've expressed
>> before, I dislike this, due to the possibility of screwing it up by
>> having the wrong tab settings as the last guy. Indentation is good,
>> delineation by indentation is bad, IMO. It also prevents me from copying
>> one block from any random place and popping it in wherever I want
>> without fixing the indentation. (altogether, this is also why I
>> personally can't stand python)
(Because it forces you to have >= moderately readable code ? :)

In my experience, much of the time I'm transferring a block of code
from one level of nesting,
there are unexpected consequences (eg. it's looking for a local
variable provided by an outer scope, its
behaviour is designed to work in a loop). So IMO having to reindent is
a feature.

(it's also pretty easy to do in a decent programmer's editor. eg. in
Emacs, I can select a region and use ctrl+c,> or ctrl+c,< to do it.
Tabbing also auto-guesses the right indentation.)

Tab settings could be an issue if you like using tabs at all.
Personally I find 2 spaces a comfortable indentation step for YAML so
I never use them. Tabs in Python code are also considered bad style.

>
> Interesting. That never bothered me about python, because I was already
> used to re-indenting when copying-and-pasting in every other language I
> have worked in. (just because python is the only language that enforces
> indentation, doesn't mean I don't religiously indent in every other
> language)
>
>> JSON: If you look at the example page
>> (http://www.json.org/example.html), then you'll notice something
>> disconcerting. Barring the last example (a servlet definition), the XML
>> versions of all those examples are more readable. It may not be as
>> concise, but it... I dunno, it looks better to me. Also, the fact that I
>> can't see at a glance which brace matches which block bugs me, slightly.

JSON is basically a coincidental subset of YAML (it matches the inline
form of YAML). That may be an advantage in terms of increasing the
number of people who understand the format.

You can mix the inline and block forms of yaml.
For example, from my library of dither matrices:

(looks a lot more sensible in monospace)
---

bayer:
  set : [bayer2x2, bayer4x4, bayer8x8, bayer16x16, bayer32x32]

## standard matrices

bayer2x2 :
  shape : 2x2
  data : >-
    4 2
    1 3

cluster3x3 :
  shape : 3x3
  data : >-
    9 7 5
    4 8 3
    2 6 1

disperse3x3 :
  shape : 3x3
  data : >-
    9 6 2
    4 1 7
    3 5 8

bayer4x4 :
  shape : 4x4
  data : >-
     16   8  13   5
      4  12   1   9
     14   6  15   7
      2  10   3  11

---

The definition of the 'bayer' set uses the inline form, everything
else uses block form.


>
> Ah, I think I spot a big reason why we disagree about this. For you, an
> important goal of using XML is the human-readability, right?
I have to say, when people say "it's human-readable" about XML, I have
to say .. no.
A nicely serialized XML file can be quite readable (eg. Inkscape
SVGs), but readability isn't a built in trait
like it is for YAML. A lot of the XML files I've seen are rather ugly,
particularly in having far too many levels of nesting and requiring me
to scroll sideways to read them. I believe that's because the
particular data they are storing is actually ill-suited to XML (which
is oriented towards trees, and is a Markup Language so is optimized
for textual data. )

>
> For me, human readability iss nice-to-have, but it isn't a big deal. I
> can see myself looking at the text of a data file to verify that it
> serialized correctly, but I wouldn't dream of hand editing or
> copying-and-pasting unless I had no other choice.
>
> What is cool about XML to me is the fact that it can store hierarchical
> tree-like data, the fact that it can store variable length lists of
> arbitrary objects, and the fact that new data members can be easily
> inserted and old data will just get the defaults.
>
> ...Byt other formats fith the above description just fine. I could even
> make a binary format that had all those features.
HDF5 :)  (it's even a self-documenting format:)

Note: that's not a suggestion or anything; HDF is cool, but seriously
heavy duty.
>
>> XML: Yes, XML is the bulkiest of the lot. But, it's the most mature
>> format as well. And, lots of tool support is available. But, the thing
>> that really sells me on XML is that I could, in theory, write an XSLT
>> stylesheet, attach it to my textbox document, and get a preview of every
>> textbox in my game, for example. Or, also in theory, I could embed every
>> data lump in one document (if I ever decided that was necessary or a
>> good idea).
>
> If we are talking XML as an import/export format, I can't find any
> argument against it, but as an internal format, I don't care for it.

If you're using XML, import/export is a nice place to use it, since
DTD's can give you basic 'bad-input' checking.

David



More information about the Ohrrpgce mailing list