Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation for regression testing scripts. Not yet complete. #243

Open
wants to merge 2 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 140 additions & 138 deletions documentation/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,162 +8,164 @@

{% include navbar.html nav=site.data.navbar %}
<div class="container theme-showcase" role="main">
<h1>Documentation</h1>
<a class="name" name="introduction">
<h2>An introduction to JHOVE</h2>
</a>
<p>
JHOVE provides functions to perform format-specific identification,
validation, and characterization of digital objects.
</p>
<ul>
<li>
Format <em>identification</em> is the process of determining the format to
which a digital object conforms; in other words, it answers the question:
"I have a digital object; what format is it?"
</li>
<li>
Format <em>validation</em> is the process of determining the level of
compliance of a digital object to the specification for its purported format,
e.g.:
"I have an object purportedly of format <em>F</em>; is it?"
<br/>
Format validation conformance is determined at two levels:
<em>well-formedness</em> and <em>validity</em>.
<ol>
<h1>Documentation</h1>
<a class="name" name="introduction">
<h2>An introduction to JHOVE</h2>
</a>
<p>
JHOVE provides functions to perform format-specific identification,
validation, and characterization of digital objects.
</p>
<ul>
<li>
A digital object is well-formed if it meets the purely
syntactic requirements for its format.
Format <em>identification</em> is the process of determining the format to
which a digital object conforms; in other words, it answers the question:
"I have a digital object; what format is it?"
</li>
<li>
An object is valid if it is well-formed and it meets additional
semantic-level requirements.
Format <em>validation</em> is the process of determining the level of
compliance of a digital object to the specification for its purported format,
e.g.:
"I have an object purportedly of format <em>F</em>; is it?"
<br/>
Format validation conformance is determined at two levels:
<em>well-formedness</em> and <em>validity</em>.
<ol>
<li>
A digital object is well-formed if it meets the purely
syntactic requirements for its format.
</li>
<li>
An object is valid if it is well-formed and it meets additional
semantic-level requirements.
</li>
</ol>
<p>
For example, a <a href="/references#tiff6">TIFF</a> object is
well-formed if it starts with an 8 byte header followed by a sequence of
Image File Directories (IFDs), each composed of a 2 byte entry count and
a series of 8 byte tagged entries.
The object is valid if it meets certain additional
semantic-level rules, such as that an RGB file must have at least three sample
values per pixel.
</p>
</li>
</ol>
<p>
For example, a <a href="/references#tiff6">TIFF</a> object is
well-formed if it starts with an 8 byte header followed by a sequence of
Image File Directories (IFDs), each composed of a 2 byte entry count and
a series of 8 byte tagged entries.
The object is valid if it meets certain additional
semantic-level rules, such as that an RGB file must have at least three sample
values per pixel.
</p>
</li>
<li>
Format <em>characterization</em> is the process of determining the
format-specific significant properties of an object of a given format, e.g.:
"I have an object of format <em>F</em>; what are its salient properties?"
<p>
<a class="name" name="repinfo">The set</a> of characteristics reported
by JHOVE about a digital object is known as the object's
<em>representation information</em>, a concept introduced by the Open
Archival Information System (OAIS) reference model
[<a href="/references#oais">ISO/IEC 14721</a>].
The standard representation information reported by JHOVE includes:
file pathname or URI, last modification date, byte size, format,
format version, MIME type, format profiles, and optionally, CRC32, MD5,
and SHA-1 checksums [<a href="/references#crc32">CRC32</a>,
<a href="/references#md5">MD5</a>,
<a href="/references#sha1">SHA-1</a>].
Additional media type-specific representation information is consistent with
the <a href="/references#z39.87">NISO Z39.87</a>
Data Dictionary for digital still images and
the draft AES metadata standard for digital audio.
</p>
</li>
</ul>
<p>
Identification, validation, and characterization actions are frequently
necessary during routine operation of
digital repositories and for digital preservation activities.
These actions are performed by <em>modules</em>.
The output from JHOVE is controlled by <em>output handlers</em>.
JHOVE uses an extensible plug-in architecture; it can be configured at the
time of its invocation to include whatever specific format modules and
output handlers that are desired.
The initial release of JHOVE includes <a href="/modules/" title="JHOVE modules list">modules</a> for
<a href="/modules/bytestream">arbitrary byte streams</a>,
<a href="/modules/ascii">ASCII</a> and
<a href="/modules/utf8">UTF-8</a> encoded text,
<a href="/modules/gif">GIF</a>,
<a href="/modules/jpeg2000">JPEG2000</a>, and
<a href="/modules/jpeg">JPEG</a>, and
<a href="/modules/tiff">TIFF</a> images,
<a href="/modules/aiff">AIFF</a> and
<a href="/modules/wave">WAVE</a> audio,
<a href="/modules/pdf">PDF</a>,
<a href="/modules/html">HTML</a>, and
<a href="/modules/xml">XML</a>; and
text and XML output handlers.
</p>
<li>
Format <em>characterization</em> is the process of determining the
format-specific significant properties of an object of a given format, e.g.:
"I have an object of format <em>F</em>; what are its salient properties?"
<p>
<a class="name" name="repinfo">The set</a> of characteristics reported
by JHOVE about a digital object is known as the object's
<em>representation information</em>, a concept introduced by the Open
Archival Information System (OAIS) reference model
[<a href="/references#oais">ISO/IEC 14721</a>].
The standard representation information reported by JHOVE includes:
file pathname or URI, last modification date, byte size, format,
format version, MIME type, format profiles, and optionally, CRC32, MD5,
and SHA-1 checksums [<a href="/references#crc32">CRC32</a>,
<a href="/references#md5">MD5</a>,
<a href="/references#sha1">SHA-1</a>].
Additional media type-specific representation information is consistent with
the <a href="/references#z39.87">NISO Z39.87</a>
Data Dictionary for digital still images and
the draft AES metadata standard for digital audio.
</p>
</li>
</ul>
<p>
Identification, validation, and characterization actions are frequently
necessary during routine operation of
digital repositories and for digital preservation activities.
These actions are performed by <em>modules</em>.
The output from JHOVE is controlled by <em>output handlers</em>.
JHOVE uses an extensible plug-in architecture; it can be configured at the
time of its invocation to include whatever specific format modules and
output handlers that are desired.
The initial release of JHOVE includes <a href="/modules/" title="JHOVE modules list">modules</a> for
<a href="/modules/bytestream">arbitrary byte streams</a>,
<a href="/modules/ascii">ASCII</a> and
<a href="/modules/utf8">UTF-8</a> encoded text,
<a href="/modules/gif">GIF</a>,
<a href="/modules/jpeg2000">JPEG2000</a>, and
<a href="/modules/jpeg">JPEG</a>, and
<a href="/modules/tiff">TIFF</a> images,
<a href="/modules/aiff">AIFF</a> and
<a href="/modules/wave">WAVE</a> audio,
<a href="/modules/pdf">PDF</a>,
<a href="/modules/html">HTML</a>, and
<a href="/modules/xml">XML</a>; and
text and XML output handlers.
</p>

<h2>Tutorial</h2>
<h2>Tutorial</h2>

<ul>
<li> <a href="/getting-started/">Getting started with JHOVE</a> (2015-10-20)
<li> <a href="/documentation/parser/">Selecting an XML parser</a> (2007-04-04)
</ul>
<ul>
<li><a href="/getting-started/">Getting started with JHOVE</a> (2015-10-20)
<li><a href="/documentation/parser/">Selecting an XML parser</a> (2007-04-04)
</ul>

<h2>JHOVE API</h2>
<h2>JHOVE API</h2>

<ul>
<li> All JHOVE <a href="/javadoc/">packages and classes</a>
<li> UML class <a href="/img/api.gif">diagram</a>
</ul>
<ul>
<li> All JHOVE <a href="/javadoc/">packages and classes</a>
<li> UML class <a href="/img/api.gif">diagram</a>
</ul>

<h2>Best Practice</h2>
<h2>Best Practice</h2>

<ul>
<li> <a href="/documentation/dev-module/">Writing a JHOVE Module</a>
(2005-02-07)</li>
<li> Writing a JHOVE Output Handler</li>
</ul>
<ul>
<li><a href="/documentation/dev-module/">Writing a JHOVE Module</a>
(2005-02-07)
</li>
<li> Writing a JHOVE Output Handler</li>
</ul>

<h2>Schemas</h2>
<h2>Schemas</h2>

<ul>
<li> JHOVE output schema <a href="http://hul.harvard.edu/ois/xml/xsd/jhove/jhove.xsd">jhove.xsd</a>
<li> JHOVE configuration file schema
<a href="http://hul.harvard.edu/ois/xml/xsd/jhove/jhoveConfig.xsd">jhoveConfig.xsd</a>
</ul>
<ul>
<li> JHOVE output schema <a href="http://hul.harvard.edu/ois/xml/xsd/jhove/jhove.xsd">jhove.xsd</a>
<li> JHOVE configuration file schema
<a href="http://hul.harvard.edu/ois/xml/xsd/jhove/jhoveConfig.xsd">jhoveConfig.xsd</a>
</ul>

<h2>Specifications</h2>
<h2>Specifications</h2>

<p>
Standard JHOVE modules:
</p>
<ul>
<li> The <a href="/modules/aiff/">AIFF-hul</a> module (2005-05-09)
<li> The <a href="/modules/ascii/">ASCII-hul</a> module (2004-03-03)
<li> The <a href="/modules/bytestream/">BYTESTREAM</a> module (2004-03-03)
<li> The <a href="/modules/gif/">GIF-hul</a> module (2005-05-09)
<li> The <a href="/modules/html/">HTML-hul</a> module (2005-05-09)
<li> The <a href="/modules/jpeg/">JPEG-hul</a> module (2005-05-26)
<li> The <a href="/modules/jpeg2000/">JPEG2000-hul</a> module (2005-05-26)
<li> The <a href="/modules/pdf/">PDF-hul</a> module (2008-02-25)
<li> The <a href="/modules/tiff/">TIFF-hul</a> module (2005-05-09)
<li> The <a href="/modules/utf8/">UTF8-hul</a> module (2005-05-09)
<li> The <a href="/modules/wave/">WAVE-hul</a> module (2004-12-17)
<li> The <a href="/modules/xml/">XML-hul</a> module (2005-05-09)
</ul>
<p>
Standard JHOVE modules:
</p>
<ul>
<li> The <a href="/modules/aiff/">AIFF-hul</a> module (2005-05-09)
<li> The <a href="/modules/ascii/">ASCII-hul</a> module (2004-03-03)
<li> The <a href="/modules/bytestream/">BYTESTREAM</a> module (2004-03-03)
<li> The <a href="/modules/gif/">GIF-hul</a> module (2005-05-09)
<li> The <a href="/modules/html/">HTML-hul</a> module (2005-05-09)
<li> The <a href="/modules/jpeg/">JPEG-hul</a> module (2005-05-26)
<li> The <a href="/modules/jpeg2000/">JPEG2000-hul</a> module (2005-05-26)
<li> The <a href="/modules/pdf/">PDF-hul</a> module (2008-02-25)
<li> The <a href="/modules/tiff/">TIFF-hul</a> module (2005-05-09)
<li> The <a href="/modules/utf8/">UTF8-hul</a> module (2005-05-09)
<li> The <a href="/modules/wave/">WAVE-hul</a> module (2004-12-17)
<li> The <a href="/modules/xml/">XML-hul</a> module (2005-05-09)
</ul>

<ul>
<li> <a href="/references/">References</a> (2005-05-09)
</ul>
<ul>
<li><a href="/references/">References</a> (2005-05-09)
</ul>

<h2>Proposals</h2>
<h2>Proposals</h2>

<ul>
<li> <a href="http://www.jhove2.org/">JHOVE2 homepage</a></li>
<li> <a href="http://gicl.cs.drexel.edu/images/c/c2/JHOVE2-proposal.doc">JHOVE2 project proposal</a></li>
</ul>
<ul>
<li><a href="http://www.jhove2.org/">JHOVE2 homepage</a></li>
<li><a href="http://gicl.cs.drexel.edu/images/c/c2/JHOVE2-proposal.doc">JHOVE2 project proposal</a></li>
</ul>

<h2>For developers</h2>
<ul>
<li> <a href="/documentation/build/">Building JHOVE</a></li>
</ul>
<h2>For developers</h2>
<ul>
<li><a href="/documentation/build/">Building JHOVE</a></li>
<li><a href="/documentation/testing/">Testing JHOVE</a></li>
</ul>
</div>
{% include footer.html %}
{% include footer.html %}
</body>
</html>
70 changes: 70 additions & 0 deletions documentation/testing/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: page
title: Testing JHOVE
---
{{ page.title }}
================

Regression Testing
------------------
If you are building JHOVE on a Linux system, you can run regression testing of the output of the JHOVE modules
against a standard test corpus of files using scripts supplied in the JHOVE project. You can run a full regression test
of all modules using a single supplied script.

This installs the latest stable release of JHOVE and runs it against the full test corpus to establish the baseline
output; it then builds the latest development version from the project code and runs that over the same test corpus.
Finally, the output of the two sets of output are compared and any differences recorded.

### Running Full Regression Test
In the root of your JHOVE project, run the script:

./jhove-bbt/scripts/travis-test.sh

This outputs a full trace to the stdout stream.

### Running Regression tests manually
#### Establishing a Baseline
The script `jhove-bbt/scripts/baseline-jhove.sh` can be run using an existing installation of JHOVE, and can be used to
test a smaller or more focussed test corpus. the usage for the script is:

jhove-baseline [-j <pathToJhoveRoot>] [-c <pathToCorpora>] [-o <pathToOutput>] [-h|?]"

-j pathToJhoveRoot The full path to the root of a JHOVE installation.
-c pathToCorpora The path to the root directory of the test corpora.
-o pathToOutput The path to the root directory for baseline output.
-h Show usage
-? Show usage

The directory specified by `pathToCorpora` should contain at least one descendent directory called "modules", below
which should be further directories with names matching JHOVE modules (note: the "modules" directory does not need to be
a direct child of the `pathToCorpora` directory, but the JHOVE module directories should be direct children of the
"modules" directory). Each of these directories should contain test files to be analysed by that JHOVE module.

For example, if using `/home/user/testdata` as the `pathToCorpora`, this directory should look like:

/home/user/testdata
|-jhove
|-modules
|-PDF-hul
|-TestFile1.pdf
|-TestFile2.pdf
|-XML-hul
|-TestFile3.xml
|-TestFile4.xml

This creates an xml file in the output for each test file in the corpus, mirroring the corpora directory structure, e.g.
if you use `/home/user/output` for the `pathToOutput` variable, you will see output like:

/home/user/output
|-jhove
|-modules
|-PDF-hul
|-TestFile1.pdf.jhove.xml
|-TestFile2.pdf.jhove.xml
|-XML-hul
|-TestFile3.xml.jhove.xml
|-TestFile4.xml.jhove.xml

To run this baseline script against the full test corpus, you can run:

${JHOVE_PROJECT_DIR}/jhove-bbt/scripts/jhove-baseline.sh -j ${JHOVE_INSTALL_DIR} -c ${JHOVE_PROJECT_DIR}/test-root/corpora -o ${OUTPUT}
10 changes: 9 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,16 @@ <h2>Software</h2>
<h2>Getting started</h2>
</a>
<p>
The <a href="/getting-started/">gettting started guide is on this site</a>.
The gettting started guide is <a href="/getting-started/">on this site</a>.
</p>

<a class="name" name="documentation">
<h2>Documentation</h2>
</a>
<p>
More detailed documentation is also available <a href="/documentation/">on this site</a>
</p>

<a class="name" name="license">
<h2>License</h2>
</a>
Expand Down