The code base is in a very early stage of development (beta). All the usual disclaimers apply. Use at your own risk.
Download: [ shoebox.tar.gz ]
Shoebox data files are marked up in Standard Format, a loosely defined data format which is basically obsolescent (thanks to XML). As is the case with XML, it is useful to distinguish between two kinds of valid Shoebox data files: well-formed and valid.
Well-formedness is therefore a necessary (but not sufficient) condition for validity.
Without validating information from a Shoebox metadata file, the following assumptions will be made about standard format:
Some examples of well-formed and ill-formed Shoebox data are provided below:
| Well-Formed | |||||
|
|||||
| Ill-Formed | |||||
\ref orang \ps N \ge person |
\1 orang \2 N \3 \4 person |
\ref orang \ \ge person \ge people |
There are some special field markers reserved for use by Shoebox which violate the rule concerning the initial-character of the field marker--for example, \+mkrset and \-mkrset. So far, it seems to be only + and -. The rule for the first character might therefore be relaxed.
There are 3 ways of parsing a Shoebox file into entries and their associated fields:
| Available Information | Description | Can be Parsed? | Can be Validated? |
| No Metadata | If no metadata is available, it is assumed that the first field of the first entry encountered is the head field. | Yes | No |
| Head Field Known | If the head field marker is known, it is therefore possible to properly handle multiline fields. | Yes | No |
| Shoebox Metadata | With the full metadata available (i.e., *.typ file), it is possible to parse the data file and ensure that the contents conform to the metadata constraints. | Yes | Yes |
When a full metadata description of a Shoebox file is available, the Shoebox parser can validate its contents against their metadata specification. All validation errors extend from the class ShoeboxValidationError. The following validation errors are recognized:
A good deal of the functionality for Shoebox per se (rather than Standard Format) has been reverse-engineered. If anyone is aware of explicit specifications for the make-up of Shoebox metadata, please contact the author.
You can test the the Standard Format parser by running some test scripts (bin.tar.gz) on the sample data that comes with Shoebox (samples.tar.gz). Make sure that the shoebox directory containing the Shoebox modules can be found by Python by adding it to the environment variable PYTHONPATH.
For purposes of illustration, assume that all of the above-linked files reside in your home directory called foo. The following Bash session illustrates how to run a sample script:
~/foo $ ls bin.tar.gz samples.tar.gz shoebox.tar.gz ~/foo $ gunzip bin.tar.gz ~/foo $ tar xf bin.tar ~/foo $ gunzip samples.tar.gz ~/foo $ tar xf samples.tar ~/foo $ gunzip shoebox.tar.gz ~/foo $ tar xf shoebox.tar ~/foo $ rm -f *.tar ~/foo $ ls bin samples shoebox ~/foo $ export PYTHONPATH=$PYTHONPATH:~/foo/shoebox/ ~/foo $ python bin/print-shoebox.py -s samples/Frisian1/FriRt.dic |
For a collection of use cases, see the following different possibilities:
| Available Information | Test Scripts |
| No Metadata | $ python bin/print-shoebox.py -s samples/Frisian1/FriRt.dic $ python bin/print-shoebox.py -s samples/Frisian2/FriRt.dic $ python bin/print-shoebox.py -s samples/Axint/Ax.lex |
| Head Field Known | $ python bin/print-shoebox.py -s samples/Frisian1/FriRt.dic -f lx $ python bin/print-shoebox.py -s samples/Frisian2/FriRt.dic -f lx $ python bin/print-shoebox.py -s samples/Axint/Ax.lex -f lx |
| Shoebox Metadata | $ python bin/print-shoebox.py -s samples/Frisian1/FriRt.dic -m samples/Frisian1/FrisianD.typ $ python bin/print-shoebox.py -s samples/Frisian2/FriRt.dic -m samples/Frisian2/FrisianD.typ $ python bin/print-shoebox.py -s samples/Axint/Ax.lex -m samples/Axint/Axininc2.typ |
For more examples and explanation, see the tutorial.
The author may be contacted at Stuart DOT Robinson AT mpi DOT nl.