Digital Humanities vs. Prof. Stanley Fish

Wednesday, March 13, 2013

The REVISED SCRIPT

I. Introduction

The perl script below, which extracts the lexical data from the Areopagitica, is divided into three parts.

1. The first takes a raw ascii file one can get from Project Gutenberg or several other Milton sites and formats a word list.

http://www.constitution.org/milton/areopagitica.htm : This site has a ascii file of the Areopagitica with no markup ready to load into your word processor.

http://www.gutenberg.org/ebooks/608 : The Gutenberg sites has several formats of the Areopagitica, plain text, Kindle, EPUB, Mobile and others.

The source in this case had minimal markup, really just a few paragraph markers for 18,000 words. The only required pre-processing was to mark each period that actually was a sentence cusps. That task was not just a global replace but required inspecting each period and excluding abbreviations but also numbers such a V. in Leo V. Other problems revolved around that questions and answers appear as one sentence. My strategy was to replace each period with a XXX, crude but effective.

Excerpt from xae.txt, directly BELOW. The pre-processing involves adding XXX as sentence cusps:

"They who to States and Governours of the Commonwealth direct their Speech, High Court of Parlament, or wanting such accesse in a private condition, write that which they foresee may advance the publick good; I suppose them as at the beginning of no meane endeavour, not a little alter'd and mov'd inwardly in their mindes: Some with doubt of what will be the successe, others with fear of what will be the censure; some with hope, others with confidence of what they have to speake.
XXX
And me perhaps each of these dispositions, as the subject was whereon I enter'd, may have at other times variously affected; and likely might in these formost expressions now also disclose which of them sway'd most, but that the very attempt of this addresse thus made, and the thought of whom it hath recourse to, hath got the power within me to a passion, farre more welcome then incidentall to a Preface.
XXX
Which though ..."

NOTE: each sentence is separated by a cusp. The counting and printing of page numbers comes late in the process. For now, each processing run starts with the raw cusp file. The purpose is to allow adjustments to the initial input source without having to go through elaborate copying routines each time the source is changed. Working from the wordlist and updating the wordlist when the source is changed just adds unnecessary confusion.

During the analysis and re-reading printed pages do have a place, at least in my world.

The word list puts each word on a line with its sequential number in the text:

1 - They
2 - who
3 - to
4 - States
...
84 - to
85 - speake.

words in sen - 85
86 - And

The individual words are each on its own line with its sequential number in the text, e.g. "1 - They", followed by "2 - who" on the next line. Sentence cusps are marked. The previous sentence is closed and the total number of words in the sentence are appended. The beginning of sentences are also marked in traditional mark-up.

There are several reasons why I generally start with a numbered word list when I process a text. The text has to be brought under the control of an algorithm. There are many ways of doing that and one is generally no better then the next. For me the idea of one word on a line comes from the old unix days when one could execute a small script at the command line that would replace each space with a line feed. That list could be passed to the uniq filter that would remove duplicate lines and count the duplicates which could be passed a sort that formats result in descending order. Thus, the command "step1" would create output along the lines:

320 - and
205- the
190 - a ...

From there it was just another small script to collect all the sequential numbers of the individual words, producing a vector through the text.

dich,1
258692,
dichtungen,1
086119,
die,9768
000030, 000122, 000130, 000133, 000149, 000162, 000177, 000186, 000189, 000221, 000241
000248, 000253, 000257, 000270, 000293, 000302, 000344, 000368, 000372, 000390, 000401
000406, 000463, 000505, 000520, 000524, 000541, 000576, 000581, 000590, 000641, 000697
...

In this example from the text of Husserl's Logical investigations the three words, their totals and the following vector are shown. In the case of "die" - generally an article (the) but with other grammatical functions - there are close to 10,000 instances, the vector starts at 30. This set of numbers, derived from the sequential word list can lead to many different statistical calculations.

This particular function does not really interest us here with the Areopagitica. It would require too much pre-processing given the orthographic irregularities. Consequently, I plan to stick to the bi-labials for now.

2. The second part of the script extracts five different views of BP's.

a. the individual sentences, example, sentence 4:

213 - liberty
217 - hope,
234 - expect;
235 - but
237 - complaints
241 - deeply
244 - speedily
250 - bound
253 - liberty
SEN 004 BP 09 STOT 052 PCT BP/STOT 0.17

Each sentence has three features, the opening, the data, and a summary in the closing: the sentence number, the number of BP's [9], the total number of words [52], and the percentage of BP's [17%].

b. a summation of the data on each sentence, fore example: sentences 221 to 227:

SEN 221 BP 03 STOT 030 PCT BP/STOT 0.10
SEN 222 BP 06 STOT 033 PCT BP/STOT 0.18
SEN 223 BP 08 STOT 055 PCT BP/STOT 0.15
SEN 224 BP 13 STOT 047 PCT BP/STOT 0.28
SEN 225 BP 14 STOT 087 PCT BP/STOT 0.16
SEN 226 BP 13 STOT 088 PCT BP/STOT 0.15
SEN 227 BP 06 STOT 036 PCT BP/STOT 0.17

This output merely tabulates the summaries from the previous output.

c. a reformat of the summaries to allow easy sorting by the "percentage pf BP's":

[here sorted by highest percentage on top]

0.31% SEN 102 BP 04 STOT 013
0.31% SEN 038 BP 15 STOT 049
0.31% SEN 023 BP 10 STOT 032
0.30% SEN 091 BP 07 STOT 023
0.29% SEN 105 BP 06 STOT 021
0.28% SEN 224 BP 13 STOT 047
0.27% SEN 163 BP 06 STOT 022
0.27% SEN 114 BP 10 STOT 037

d.. a reformat of the summaries to allow easy sorting by the "length of sentences":

[here the five sentences with the most words]
197 29 0.15 - 186
165 31 0.19 - 292
154 19 0.12 - 335
153 21 0.14 - 183
152 21 0.14 - 243

Sentences and Data in table Format

e. the summaries in html form:

3. The third and last part of the script collects data in arrays and prints out views that depend on calculations. The previous parts have generated output while to processing occurred. The final part parks the data in arrays and then extracts information and makes calculations on data that was not ready till the sequential processing was complete.

For example, the output below shows the gap between the previous and the next gap. It thus pinpoints a series of three consecutive PB's. By following short or long gaps one can investigate dense or sparse occurrences of BP's. It may well be possible to do some vector razzle-dazzle, in this case the rubber meets the road in rereading passages to see if some sound pattern emerges.

16, 012, 000012, 004, Speech,
11, 004, 000016, 007, Parlament,
17, 007, 000023, 010, private
13, 010, 000033, 003, publick
08, 003, 000036, 005, suppose
22, 005, 000041, 017, beginning
21, 017, 000058, 004, doubt
13, 004, 000062, 009, be
14, 009, 000071, 005, be
14, 005, 000076, 009, hope,
12, 009, 000085, 003, speake.

The table above shows the first sentence. 12 is the distance to the first BP from the beginning of the file. 16 is the second word, 23 the third. Thus, the distance from the word "private" to its two neighbors is 17. That is the gap.

10, 006, 000213, 004, liberty
21, 004, 000217, 017, hope,
18, 017, 000234, 001, expect;
03, 001, 000235, 002, but
06, 002, 000237, 004, complaints
07, 004, 000241, 003, deeply
09, 003, 000244, 006, speedily
09, 006, 000250, 003, bound
16, 003, 000253, 013, liberty

The table above shows sentence four. There you can see some dense occurrences. The point of this table is to sort on the basis of the gap.

SEN 4
diff 6 213 - liberty
diff 4 217 - hope,
diff 17 234 - expect;
diff 1 235 - but
diff 2 237 - complaints
diff 4 241 - deeply
diff 3 244 - speedily
diff 6 250 - bound
diff 3 253 - liberty
SEN 004 BP 09 STOT 052 PCT BP/STOT 0.17

Above you can see the series of differences: 1, 2, 4, 3, 6, 3. The arrays can be tapped for any number of views depending on questions that arise during the reading. It is clear that at this early stage, this one script can give insight only into the distribution of BP's. That is why it was written.

II. THE SCRIPT

<br />
<code>
#!c:\perl

open (H, "xae.txt");
open (OUT, ">ywordlist.txt");
print OUT ("<sen 1>\n");

my $i=0;
my $j=0;
my $c=0;
my $ln=1;

while ($line = <h>){ #########start reading TEXT

chomp $line;

if ($line =~ /XXX/){
$j=$i-$j;

print OUT ("</SEN $ln> words-> $j\n");
if ($ln =~ /347/){
last;
}

$j=$i;
$ln++;
print OUT ("<sen $ln>\n");
next;
}

my @f = split(/ /,$line);     # split line

foreach $f (@f){ 
chomp $line;
$i++;
print OUT "$i - $f \n";

}
}

close H;
close OUT;

########################### PART 2 ######################

open (H, "ywordlist.txt");
open (OUT, ">ypbnum.txt");
open (SUM, ">ypbsum.txt");
open (PCT, ">ypbpct.txt");
open (LEN, ">ypblen.txt");
open (LEN2, ">ypblen.htm");

print LEN2 "<table border=1>";    ###close table to trick blogger </table>
print LEN2 "
<tr><td>Wds in Sen</TD>
<td>BP in Sen</TD>
<td>PCT BP/Wds</TD>
<td>SEN</TD></TR>
";

my $i=0;

while ($line = <h>){

if ($line =~ /<SEN/){
print OUT $line;
next;
}

if ($line =~ /\/SEN/){
chomp $line;
$line =~ s/>//g;
my @f = split(/ /,$line);     # split line
$sen=$f[1];
$sentot=$f[3];
$pct=($i / $sentot);
print "2 $sen, $k \n";
printf OUT 
("</ SEN %03d BP %02d STOT %03d PCT BP/STOT %1.2f >\n",
 $sen, $i, $sentot, $pct);
printf SUM 
("SEN %03d BP %02d STOT %03d PCT BP/STOT %1.2f\n",
 $sen, $i,$sentot, $pct);
printf PCT 
("%1.2f% SEN %03d BP %02d STOT %03d \n",
  $pct, $sen, $i, $sentot);
printf LEN 
("%03d %02d %1.2f # %03d \n",
 $sentot, $i, $pct, $sen);

printf LEN2 ("
<tr ALIGN=CENTER>
<td>%03d</TD><td>%02d</TD>
<td>%1.2f</TD>
<td>#%1.0f</TD></TR>
\n", $sentot, $i, $pct, $sen);

$i=0;
next;
}

if ($line =~ /b/) {
$i++;
print OUT ("$line");
next;
}

if ($line =~ /p/) {
$i++;
print OUT ("$line");
next;
}

if ($line =~ /B/) {
$i++;
print OUT ("$line");
next;
}

if ($line =~ /P/) {
$i++;
print OUT ("$line");
next;
}

}

close OUT;
close H;

###################################
### PART 3 ######################## 
###################################

open (G, "ypbnum.txt");
open (OUT, ">ypbnum2.txt");
open (SRT, ">ypbnunsrt.txt");

my $flag = "one";

while ($line = <g>){

if ($line =~ /<SEN/){
chomp $line;
my @sennum = split(/ /, $line);

$sen=$sennum[1];
$sen =~ s/>/ /g;

print OUT ("1 $line\n");
next;
}

if ($line =~ /\/ SEN/) {
print OUT ("2 $line\n");
my @s = split(/ /, $line);
$sen=$s[2];
$bp=$s[4];
$wds=$s[6];
$pct=$s[9];
$holddat= ("$sen,  $bp, $wds, $pct");

foreach $holdlist (@holdlist){
$x="$holdlist, $holddat, \n";
print SRT $x;
}
undef @holdlist;
next;
}

my @numb = split(/ /, $line);
$hold=$numb[0];
$word=$numb[2];

if ($flag eq "one"){
<$flag = "two";
$diff =($hold);
$prev = $hold;
} else {
$diff=$hold-$prev;
$prev=$hold;
}

print OUT ("diff ");
print OUT ("$diff ");
print OUT ("$line");
$fixdiff=sprintf("%03d",$diff);
$wd="$fixdiff, $line";

push (@pdiff, $fixdiff);
push (@mdiff, $fixdiff);
push (@pos, $hold);
push (@word, $word);

chomp $wd;
push (@holdlist, $wd);

}

close OUT;
close G;

my $z=0;
my $hold=0;
my $y=0;
my $i=0;

open (OUT, ">ypbnum3.txt");
open (W, ">ypbnum4.txt");

for ($i=0;$i<2649;$i++){
$test1=$pos[$i];
$test=$test1-$hold;
$count=0;
$test=$test-$m;
while ($count < $test) {
print W ("$y,$z\n");
$count++;
$y++;
}

$hold=$test1;

$pos=$pos[$i];
$plus=$pdiff[$i+1];
$minus=$mdiff[$i];
$word=$word[$i];
$span=$plus+$minus;
$span=sprintf("%02d",$span);
$pos=sprintf("%06d",$pos);
print OUT ("$span, $minus, $pos, $plus, $word\n");
print W ("$y,$span\n");
$y++;
$m=1;
}
</code>

a
a
a
a
a
a
a
a
a
a
a
a
a
a
a

Thursday, March 7, 2013

Refining the Question: Bilabial Stops in Milton's Areopagitica. (and some fricatives)

Consecration of Hermagoras by Peter, Aquiela Basilica

I have maintained that the examination of bi-labials in the Areopagitica is a wild-goose chase. I apologize for the thoughtless affront to wild-geese and their chasers and for the careless use of English.

It turns out that the concept wild-goose chase is quite complex and subtle, although it has lost some subtlety in modern usage - as illustrated by my own thoughtless use. I used the term to indicate an essentially pointless endeavor involving considerable effort with no tangible result, i.e. the goose proved uncatchable.

This is not quite true to the history of the term. From wiki we learn that it originally referred to a type of horse race where the racers had to follow the tracks of the lead horse, at intervals, I have read elsewhere. The "wild-goose" reference refers to the unpredictable course of the leader, from the perspective of the followers and the imperative for the followers to follow the leader precisely. It is possible that in the 16th c. the people were still close enough to nature, to ponds and geese, that we could expect the behavior of wild-geese to be used metaphorically in a trustworthy manner.

In Shakespeare, the term seems to be used to indicate a path difficult to follow, e.g. a complex argument.

Wild Geese Descending to a Sandbank

All the wild-geese I have ever seen, the one's at the pond behind my house in North Carolina and the one's flying over the swamps around Princeton have always displayed very predictable paths. Generally they seem to fly in a straight line with the followers arranged geometrically behind the leader in a nice V-formation. Landings tend to be very graceful curving maneuvers into open water. I have yet to see a wild-goose engage in erratic flight behavior. Of course, at a distance, it is difficult for a layperson to differentiate ducks and geese.

So where does the metaphor originate. Did people in the 16c. surprise wild-geese on the ground and try to catch them only to have them fly off in all directions? Pigeons or even chickens might do the same thing. Did wild-geese, on the ground, being chased, change directions while starting the run to its flight path which must surely be a straight line given the effort to attain height and speed? Would a wild-rabbit chase be more to the point? Of course what human would risk humiliation in chasing rabbits? Hopeless. Perhaps the modern usage represents several centuries of experiments by humanity in chasing wild-geese, all of which failed abjectly due to a fast run up and a predictable flight path, up, up and away; hence the adjustment in meaning.

Did riders in the 16c., lacking beagles and a handy fox on occasion, actually chase flying flocks of geese, simply for the sport of the chase and for practice when the beagles would be brought out? So it was never really about a goose, it was about exercising the horses.

Perhaps the real metaphor should be wild goose-chase, i.e. the chase of a domesticated goose that turned "wild" because the goose did its best not to be caught, again behaving very much as a chicken would. Flying is not an option since the wings had been clipped. That does not fit the metaphor since the goose invariably lands in the roasting pan, though the human effort may have been considerable.

Given the general lack of experience of the literate populace with geese, wild or otherwise, perhaps the thought of geese wildly careening across the evening sky is a product of fancy in its 17c. meaning, spun from no observed data, like the ideas so many lit. crit. graduate students and their mentors in their discussion of Milton. It is based on lack of experience with the real thing, Geese in the former and the actual referentially ambiguous words in the text before us in the latter case.

However that may be, let me try to focus on the examination of bi-labials in the Areopagitica. This would also serve to differentiate the approach of a world famous literary critic, august scholar, if you please, from the efforts of a retired Digital Humanities perl programmer writing scripts extracting patterns of words from a text.

For Professor Fish, living his discrete situation, being surrounded by a vast collection of interpretive mechanisms and conceptual building blocks suited for interpretation, the metaphor should be the wholesale slaughter of geese. Let me remind you of his methodological snippet:

The direction of my inferences is critical: first the interpretive hypothesis and then the formal pattern, which attains the status of noticeability only because an interpretation already in place is picking it out. [Fish]

The professor has both barrels loaded and on a hair-trigger, the first with an intimate knowledge of rhetorical forms such as chiasmus and the second with intimate knowledge of the history of Milton's time, specifically the evolution of church hierarchies. Of course a mild dispepsic spasm could unleash an incidental interpretation. To be more accurate, the professor has countless guns at the ready, sitting in his blind waiting for the ducks to pass over. The action is lightning quick, the "Bishop-Presbyter" ducks appear, bam bam, and the ducks fall lifeless into the water. An interpretation has been formed joining church politics with rhetorical forms. The critic is habitually crouched in the interpretive pose. The interpretation simply pops forth. Explanation and justification follows, metaphorically, talking to the game wardens, some of whom question the validity of the hunting license or assert the expiration of the season or the overstepping of the bag limit or the unsporting use of an automatic weapon without rational controls.

William Laud, Archbishop

We have looked at the interpretation. Now let us look at the "interpretive hypothesis" [from above]. The "formal pattern" [above], e.g. the BP's, attain noticeability because an interpretation is in place. From my very incomplete grasp of Milton, this interpretation, cryptic though it is, cannot get full marks, C+ on the American scale. Why? You ask? Is your task to isolate a common, even secondary or tertiary theme? Were Bishops and Presbyters active in pre-publication censorship? Duh. Is that your interpretive hypothesis? Are you reading the Areopagitica looking for evidence of religious strife manifesting in censorship? Have you spotted the formal pattern of the bi-labial chiasmus to nail 17c. religious strife to the wall? In my view, you have picked a commonplace of 17c. history and unearthed an extremely unlikely "formal pattern" to prove something no one would deny. C+. Too easy, too obscure, too peripheral to the Areopagitica., in short, a rewrite.

In any case, an interpretation has been put into the world. Prof. Fish interprets easily, the only question is of the dozens of interpretations that offer themselves every day, what gets written down. There can be no real mysteries in the world of Fish, if there are, the public persona does not show it. Everything exists to be explained. The voice is practically bursting forth, be it the Areopagitica or the Academy Awards winner for the best movie. The world wants his opinions. The judgments are absolute: this is that, a connection has been made, read and learn. I see my function to give this mechanism a much needed service, a tweaking.

Bishop Laud's Trial

What if, upon opening the game-bag, the reader of an interpretation finds not a goose but a cuckoo bird. What if, the greatness of the critic notwithstanding, the interpretation appears nonsensical; in addition, the interpretation is in service of undermining the reader's field of work, computer work with humanities texts. The emotions that swept through the Digital Humanities community last New Years were hurt, betrayal, bewilderment, abashment, confusion. The temptation is to ignore as literally hundred things are ignored in the course of a single day, every day, starting with the fact that it may be cold and raining.

The digital humanist, in general, is a less public figure than the super-star literary critic. There are loaded weapons at the ready in the digital world, but they are not designed to slay geese. The act of killing something and having Rover go fetch is a fairly swift action. The eye sees, the finger squeezes, Rover heads for the splash and the interpretation is in the bag. The copy editors come running. The great one's have this capacity of turning a life of experience into gems of interpretation.

The digital humanist has no instant access to such treasure. The facility is more along the lines of a cartographer, mapping the lay of the land, finding out where the ducks may be and what are their flight patterns. There are months of meeting to lay out data-base structures. There is no assumption that ducks will be put into the bag tomorrow or in a month.

There is a possibility that after the map has been drawn that an interpretation will arise. It may be possible that someone else, not involved in fetching forth the data may hit upon something interpretable. In my case, in chasing bi-labials in a fairly non-metaphoric linear fashion, the result has been lists of bilabials with various labels attached, sentence number, sequential position in the text, distance to the next bi-labial and a few more.

The creation of lists involves an inherent progression from a most pedestrian beginning, a sequential list of words, to a final display, at present, which shows the words with the position in the text and the distance behind and ahead to the next bilabial that can be sorted by the gap each bi-labial straddles. Thus it is easy to identify big gaps and small gaps. One would assume that the sonorous prosody of BP's could not survive a gap of 40 to 60 words. Gaps of three, four, even eight words in sequence, on the other hand, could certainly be read to emphasize a pattern of sound. Perhaps.

There is a system in taking a sequence of 18,000 lexical items and extracting numerical data on the interrelations of the words with specific content. It is even convenient that this exercise is empty of meaning, I can concentrate on the mechanics. Textbooks have been written on this field and entry is possible at various levels of virtuosity, perl being one of the easier.

The last redesign of the output was caused by my recognition that I had concentrated exclusively on following bi-labials from one to the next to the next. In other words, I had accepted the forward motion of text, concentrating on the distance from the previous bi-labial to the next. The algorithmic logic that does that is also easier since no values have to be passed backwards. What would be more important to analysis, assuming there is something to analyze, would be the gap which each word straddles. That requires holding the data of the previous BP in stasis, while the next TWO are collected and printed out with the middle one along the lines: previous BP, BP to be printed, next BP.

For example, for starters, the first sentence contains 11 bi-labials.

[NOTE: there may well be some fricatives hiding among the plosives. But you can recognize them.]

12 - Speech,
16 - Parlament,
23 - private
33 - publick
36 - suppose
41 - beginning
58 - doubt
62 - be
71 - be
76 - hope,
85 - speake.
SEN# 001 BP 11 STOT 085 PCT BP/STOT 0.13

"Speech" is the first BP. It is the 12th word of the actual text, the first sentence of the oration as such, dismissing the front matter for now. The first sentence contains 11 BP's out of a total of 85 words, a percentage of 13 (11/85), e.g. 13%.

The sentence in question is below.

|p1
They who to States and Governours of the Commonwealth direct their Speech, High Court of Parlament, or wanting such accesse in a private condition, write that which they foresee may advance the publick good; I suppose them as at the beginning of no meane endeavour, not a little alter'd and mov'd inwardly in their mindes: Some with doubt of what will be the successe, others with fear of what will be the censure; some with hope, others with confidence of what they have to speake.

You can see that digital humanities as I practice it is quite tedious. All that seems to be happening is that one is asked to read and appreciate lists.

A more interesting list is:

diff 12 12 - Speech,
diff 4 16 - Parlament,
diff 7 23 - private
diff 10 33 - publick
diff 3 36 - suppose
diff 5 41 - beginning
diff 17 58 - doubt
diff 4 62 - be
diff 9 71 - be
diff 5 76 - hope,
diff 9 85 - speake.
SEN 001 BP 11 STOT 085 PCT BP/STOT

Here we can see the numerical relation (difference) to the previous BP. "Speech" is 12 words from the beginning - four more to the next BP. The last line summarizes the data for the sentence:

1. number of sentence (1),
2. BP's (11),
3. total words in sentence (85),
4. percentage (13).

In sentence #4 we have 52 words, 9 BP's, 17%. In addition we can see some fairly close proximities of BP's caused by a single gap of 17. The gap of 17 is large also because the nex BP is the next word. Thus the gap is completely behind the word. Looking at the small gaps from "expect - 234" to "liberty - 253" we get a percentage of 37. The cluster 234 to 244 reaches 50%. Since I don't really know if such clustering of bi-labials is unusual, in Milton's time or in our time, I will just assert that there ARE clusters of bi-labials. They can be clearly pinpointed by browsing the list. The list is around 2500 items, easy to sort, easy to scroll, easy to find the sentence in question - assuming some minor virtuosity and willingness - kazoo, not violin.

diff 6 213 - liberty
diff 4 217 - hope,
diff 17 234 - expect;
diff 1 235 - but
diff 2 237 - complaints
diff 4 241 - deeply
diff 3 244 - speedily
diff 6 250 - bound
diff 3 253 - liberty
SEN004 BP 09 STOT 052 PCT BP/STOT 0.17

Below the fourth sentence for reference.

|p4
For this is not the liberty which wee can hope, that no grievance ever should arise in the Commonwealth, that let no man in this World expect; but when complaints are freely heard, deeply consider'd and speedily reform'd, then is the utmost bound of civill liberty attain'd, that wise men looke for.

The latest, and probably last view calculates the span between BP's.

16, 012, 000012, 004, Speech,
11, 004, 000016, 007, Parlament,
17, 007, 000023, 010, private
13, 010, 000033, 003, publick
08, 003, 000036, 005, suppose
22, 005, 000041, 017, beginning
21, 017, 000058, 004, doubt
13, 004, 000062, 009, be
14, 009, 000071, 005, be
14, 005, 000076, 009, hope,
12, 009, 000085, 003, speake.

In sentence four (below) you can see the sequence of single digit spans.

10, 006, 000213, 004, liberty
21, 004, 000217, 017, hope,
18, 017, 000234, 001, expect;
03, 001, 000235, 002, but
06, 002, 000237, 004, complaints
07, 004, 000241, 003, deeply
09, 003, 000244, 006, speedily
09, 006, 000250, 003, bound
16, 003, 000253, 013, liberty

One last topic has to be covered: graphical output.

I am no great fan of graphical output in text research. The temptation is to show a graph with the assumption that spikes mean something more than a grotesque hair-do. I prefer to look at the low gap numbers in sentence four (table directly above) and immediately go to the sentence. Spikes and troughs are fine as long as they lead to an examination of the sentences forming the features.

On some level of visionary blue sky, I do wish we could run all our text through some cross between Ngram viewer, SAS and Mathematica. Btw., the Ngram results for bishop, prelate and presbyter show that bishop completely wipes the other two off the graph. There is a spike in the Bishop line around 1590 that begs for an explanation from real experts on 16c. publications.

Often, graphs of very high quality and statistical expertise are lavished on texts, where the graphs and the attendant statistics not only go over the head of the scholars in the field, but have lost the connection to the reading of a text. Alas, in projects working on up to 80 manuscripts of a tradition, the temptation is to test the outer limits, and I accept that.

In the meanwhile, graphs play a minor role in the BP chase. In the tables above, (only excerpts shown) there are some 2500 data points of BP instances. It is possible to graph 5 or 10 sentences. The graphs show nothing that you cannot see from the data tables. My reaction upon fashioning graphs is: Oh yea, and a quick click to the data tables and the text.

I did make on list of all 18000 data points of BP's and non-BP's, just to be able to make a quick and dirty Excel graph - all they show is a fairly consistent oscillation between short gaps and long gaps.

1,0,
2,0,
3,0,
4,0,
5,0,
6,0,
7,0,
8,0,
9,0,
10,0,
11,0,
12,16
13,0,
14,0,
15,0,
16,11
17,0,
18,0,
19,0,
20,0,
21,0,
22,0,
23,17
24,0,
25,0,
26,0,
27,0,
28,0,
29,0,
30,0,
31,0,
32,0,
33,13
34,0,
35,0,
36,08
37,0,
38,0,
39,0,
40,0,
41,22
42,0,
43,0,
44,0,
45,0,
46,0,
47,0,
48,0,
49,0,
50,0,
51,0,
52,0,
53,0,
54,0,
55,0,
56,0,
57,0,
58,21
...

I am not yet ready to draw any conclusions from the inescapable fact of lage gaps and small gaps. The graphs below will allow you to make up your own minds. Everything here can be repeated so the warning: "Don't try this at home girls and boys does not apply here."

18,000 Data Points of All BP's

The graph of all the data points only shows that there are considerable gaps in the distribution; by sorting the data tables it is quite easy to separate out the big gaps and the little gaps.

The graph below focuses on a smaller context.

This graph focuses on the first 6 sentences, 405 data points. The last two data points, 16 and 16 are the last two BP's in sentence 6. The arrows point to sentence cusps.

The graph below covers the first 46 data points.

106 data points below.

210 data points below. Note that the points represent gaps. Low values point to dense patterns of BP's and large spikes, the absence of BP's.

Even the fine grained graphs do not really tell a story. There is no real connection between the act of reading a text and inspecting the graph. Perhaps it would be possible to create an interface where clicking on a data point would lead into the text.

The same can be achieved with a simple three window text display (cited before). The point is to have easy access to sentences of the Areopagitica. The printed editions do not provide that. The often extremely long sentences are presented in extremely long paragraphs. As I see the task at hand, the logistics of Milton studies need to be improved. The interface below concentrates on bi-labial plosives, but any number of more valuable features could be extracted from the text, put into an abstract form with the links back into the text. We must help the human brain to easier access to our textual tradition. Prof. Fish is one of our great athletes on the court of text. But reading texts and understanding our heritage cannot be left to virtuosi in subjective expression, objective it may seem. Our knowledge of nature began in our civilization with the questions Aristotle presented. The systematic work over two centuries has forced nature to yield many erstwhile secrets. In the 20th c. we have made some great strides to a more universal understanding of the texts of discrete cultures. How can we coexist on the planet with very similar physiological processes, very similar existential challenges, yet with so opposed cultural expressions.

We must make access to text easier. The point is not just to increase access to schools and universities, we must improve the logistics of bringing to texts all that is required to work through them. Many Digital Humanists are convinced the answer lies in automatic processes that can quantify vast amounts of text. Perhaps, likely even. Google has astonished me in the last ten years. Yet, it is not uninteresting to work in depth on a single text of 18,000 words.

Ngram has shown me a blip in the uses of the word bishop in 1590 continuing for some twenty years. The thought will haunt me for the next couple of days. I suspect that it may be merely an accident of what has been scanned. Until we get a more complete and more accurate record of our texts such blips will be little more than phantom images on our still relatively primitive machines.

One of the guiding lights, quite peripheral to what I am doing here, but still a guide into the future of pedagogical work with old texts is the work of Jonathan F. Bennett.

Professor Bennett has had a long career at various universities starting at Cambridge and continuing to universities in Canada and the US. He has gleaned the insight from decades of teaching that very pedestrian language problems are blocking access to ideas from the 17th c. for modern students. In philosophy the problem is not the ideas, but the archaic language in which the ideas are presented. I understand that a student of Milton might be required to deal not only with Milton's ideas but also with his language. The study of Descartes might not operate under the same imperatives. Prof. Bennett does not work with Milton and he is fully aware of the controversial aspect of his recent work and the need not to leave the early modern period completely. He believes that the benefits for students outweigh the imperatives of faithful reproductions of old editions. Prof. Bennett concentrates on philosophy texts:

When students are introduced to the great philosophical works of the early modern period, it is usually in the hope that they will engage with the thoughts and arguments that the texts present. The teaching experience of many of us suggests that most students simply cannot understand these texts. The increasing rate of change in the English language ensures that fewer and fewer of today’s readers can cope with the writings of the 16th-18th centuries. There are difficulties of syntax, length and complexity of sentences, words that are no longer current, still-familiar words used in meanings that they now do not have, arcane references to other philosophers which today’s students will seldom understand or be required to follow up; these and other factors create forbidding obstacles to engaging with these early modern texts. I reduce the obstacles so that students can more easily come to grips with the philosophical thoughts the texts express. Once they do that, they still won’t have an easy time, because the material itself is hard; but their efforts will go into getting philosophical understanding, not decoding old prose. http://www.earlymoderntexts.com/f_why.html

The same thing can be said for the Areopagitica. We read that text not for the poetry of it, Prof. Fish is here the exception, we read it for the ideas. We could start speculating with smoothing the language. I am not completely convinced that it is impossible to separate the language of the 17th c. from the ideas of the Areopagitica.

The argument here, the point to the effort is to encourage text workers to use the resource of the windowed laptop. I am not concerned with the data-miners. I am concerned with specialists on small areas of the text tradition who should find ways to use computers and algorithms to map their field with greater precision. I have explained enough that the text in the window below should be comprehensible. I have no illusion that this work is easy, neither was the path from Lachmann to Cladistics. It may be that the text miners will force us to forget who transmitted what to whom, which we have done now for 200 years with complete philological rigor, and concentrate instead on what Milton is telling us in the first place. Or they may be do both. To use modern scientific methods to revisit questions that lost relevance a hundred years ago seems atavistic nostalgia. Much of what has survived in our academies of textual positivism must be rethought in terms of opening the tradition to readers, not in perusing ever more esoteric provenance studies.

As such, the problem of what is a sentence in Milton becomes less important. For example, is a question - a phrase ending in a question mark - a sentence, even if the following phrase is not capitalized? largely becomes irrelevant. It becomes an easily understandable example of irrelevance. In the past our academic methodologies have tried to reproduce type-setting conventions of the past. In our new electronic editions, we can ignore the conventions of the past and try to recapture the communication. Is there any reason to carry Milton's spelling "wee" into the present? The loss of meaning due to lack of familiarity with 17c. prose is greater than some daring maverick replacing all the "wee" with "we" and capitalizing the first word after a question mark, just to make parsing text easier for perl programs. To some that would be irresponsible vandalism, endangering the transmission, knawing at the foundations. I say, lets save what we can for the students of today. To this end, I plan a few more posts, principally on and-pairs and the use of apostrophe.

Three Widows: 1. percentages, 2. sentence profiles, 3. the text.

So what has all this programming yielded? The answer is not much really. For me, the exercise was not completely uninteresting; I exercised my perl programs. I got a chance to practice some feature extraction I had not tried before, on a text I had not touched since undergraduate days. In addition, programs are living things. When they are awoken and applied to data, they execute logic that works in harmony with the human mind. The program gives me the percentage of BP's in every sentence in the Areopagitica - and sorts them ascending or descending. As such, the program has a life, a script life, a symbiant life designed around the deficiencies of the human brain in assimilating streams of words. As such, it has intrinsic value. Its intrinsic value also requires that it be perfected, optimized, and extended as new questions present themselves. There are some quasi-parental obligations we have towards our programs.

I have resisted interpretation. What about the single digit spans? What about the fact that "liberty" is the first and last BP in sentence four. My weapons have the safety on; these geese are safe for now.

However, there are collateral benefits. In chasing BP's, I have had to pour over the text in some detail. I have checked individual lexical items and tracked down a few chiasmi. I must admit that my attempt to get profit from a sequential reading has yielded sparse results. I have outlined and parsed the first 26 sentences, the introduction. I have followed Milton's history of censorship and gained some insight into the last phases of censorship from the inquisition to the Church of England to to the Presbyters. I have followed Arber's outline in his 1868 reprint and secured a lifeline through the imparsable. It has been curious how library vandals have marked up the library books scanned for Google. There are lines and arrows to track down hidden sentence parts. So I am not the only one who is having problems. However, it s not necessary to disfigure electronic texts. I cannot stress enough how important is is to have the physical aspect of a text well in hand in a multi-window text processor. Milton editions, under the guise of historical bling-bling err on the side of the textual brier-patch.

Milton's argument in the Areopagitica does careen from one unfamiliar reference to the next. I do feel vindicated in my emphasis of Milton's conciliatory mission. He is trying to convince his "Parlament" of the greatness of England and the contribution of its learned men. At the beginning of the final appeal for tolerance in the last 20 sentences, Milton rises above his argument.

Before I return to the programming I would like to share two quotes in sentences relatively easy to parse.

|p321
What else is all that rank of things indifferent, wherein Truth may be on this side, or on the other, without being unlike her self.

Sentence 321 exhibits a healthy perspectivism in a time of absolutism, when few would share this thought: it may be possible for truth to be on opposing sides and still be truth in each case.

Sentence 324 pleads for peace and withholding of judgement.

|p324
How many other things might be tolerated in peace, and left to conscience, had we but charity, and were it not the chief strong hold of our hypocrisie to be ever judging one another.

In this context it is difficult to render my final judgement on the tools Prof. Fish uses to smite Digital Humanities. I continue to maintain that he is rushing down a doctrinal debate that is not an essential, only a peripheral theme in the Areopagitica. Whether he is just caught in a subjective moment, or lampooning, or just stuck with a unreflected argument, I cannot say. In any case, sharp reflexes that are so helpful in sport, should be restrained in hermeneutics.

I do appreciate having been goaded through this extended tour through Milton; wish I had some standing to interpret in the 17th c.

Before I close this project I plan some more posts:

1. to present the latest version of the script which traces a path from unnumbered sentences to ever more focused list. The point being to attain effiiciencies by starting with the text in each of the literally hundreds of test runs. That way on can fix a typo in the Milton text and not interfere with tests on the percent calculations. The corrected word, or a faux sentence cusp, will automatically be carried to the latest list. The production of analysis tool goes hand in hand with cleaning the text. One chief task of the "cleaning" is to smooth out the arbitrary and unsystematic printing conventions of yore.
2. to extract all and pairs and find several views that shed some light on equivalences in Milton's thought at this time (and to recommend the extraction of conjunctions and their arguments as a general methodology with texts);
3. to extract the various words with apostrophe to indicate missing words.

More on this later.

For now I would like to return to the BP chase and discuss the latest version of the script. Please go to the next, newer post.

Friday, March 9, 2012

APPENDIX C: The perl Programs

APPENDIX C: The perl Programs

A (belated) NOTE: My plans have not worked out for this programming project and I just found out that I had some issues with the perl code displayed in this post. I have rethought the problem and posted the "revised" code with further explanations in a new series of posts. However, at present, textareas are not a happy solution Naively, I thought a "code" tag would take care of everything. That is how I would have done it. It does some things but cannot deal with the angle brackets of a while loop. All robots should keep their skinny fingers off the text inside the "code" tag. Who did away with that anyway?Alas. Anyway, my apologies to all twelve persons who clicked this post. The problem is fixed, but it was a lot of trouble. ANYWAY, everything about last years blogs still stands, I have learned much about Milton. And will continue to post. cheers, PB

NOTE: A future blog will present some discussion of the programs. In this blog it is important to appreciate the applications of "programming" to one of the classics of the English language, the Areopagitica. The format for the presentation of the six perl programs will be:

a. INPUT, FILENAME, INPUT DATA SAMPLE,
b. PROGRAM STATEMENTS, PROGRAM LOGIC,
c. OUTPUT, FILENAME, OUTPUT DATA SAMPLE.

1. Program ONE - Creating a list of words one word per line. Counting sentences.

INPUT: The text of the Areopagitica. Minor pre-processing was done to mark sentence cusps with XXX. This is done as a convenience and aid for modern readers, e.g. me, since the 17c. prose style requires intense parsing of individual sentences. (Sample only, the actual program parses all sentences, ca. 300; sample contains: SENTENCES 1 to 3 ... and LAST SENTENCE.)

FILENAME: ae.txt

START INPUT FOR PROGRAM ONE

They who to States and Governours of the Commonwealth direct their Speech, High Court of Parlament, or wanting such accesse in a private condition, write that which they foresee may advance the publick good; I suppose them as at the beginning of no meane endeavour, not a little alter'd and mov'd inwardly in their mindes: Some with doubt of what will be the successe, others with fear of what will be the censure; some with hope, others with confidence of what they have to speake.
XXX
And me perhaps each of these dispositions, as the subject was whereon I enter'd, may have at other times variously affected; and likely might in these formost expressions now also disclose which of them sway'd most, but that the very attempt of this addresse thus made, and the thought of whom it hath recourse to, hath got the power within me to a passion, farre more welcome then incidentall to a Preface.
XXX
Which though I stay not to confesse ere any aske, I shall be blamelesse, if it be no other, then the joy and gratulation which it brings to all who wish and promote their Countries liberty; whereof this whole Discourse propos'd will be a certaine testimony, if not a Trophey.
XXX
.
.
.
But of these Sophisms and Elenchs of marchandize I skill not: This I know, that errors in a good government and in a bad are equally almost incident; for what Magistrate may not be mis-inform'd, and much the sooner, if liberty of Printing be reduc't into the power of a few; but to redresse willingly and speedily what hath bin err'd, and in highest autority to esteem a plain advertisement more then others have done a sumptuous bribe, is a vertue (honour'd Lords and Commons) answerable to Your highest actions, and whereof none can participat but greatest and wisest men.
XXX

END OF INPUT FOR PROGRAM ONE

START PROGRAM BELOW "###" Indicates comments on the statements on left.


#!c:\perl                     ###Perl requires this first line.

open (H, "ae.txt");           ###Queue up text file for processing.
open (OUT, ">wordlist.txt");  ###Queue empty file for the wordlist.

$ln=0;               ###Line count is 0.

open (C, "ae.txt");  ###Open text file.
while ($line = <H>){ #########Start reading TEXT - get line.

chomp $line;   ###Renove trailing line feed - don't worry about this.
               ###Line feeds i.e. characters indicating an new line
               ###get in the way. Remove them from the line and add
               ###them later when you need them. The character is "\n".
               ###Look four lines below, exacting but not that weird.
               ###A place for everything and everything in its place.

if ($line =~ /XXX/){          ###Check if there is a line
                              ###or a sentence cusp.
         
$ln++;                        ###Increment sentence count.
print OUT ("$ln-XXX\n");      ###Output: sentence count, dash,
                              ###sentence cusp and line feed "\n".
next;                         ###Get next line.
}

my @f = split(/ /,$line);     ##Split line into individual
                              ##words in the array @f.

foreach $f (@f){     ###Process each word in array @f.
print OUT "$f\n";    ###Put word on its own line
                     ###into the output file >wordlist.   
}                    ###End of word loop - get next word.
}                    ###End of line loop - get next line.

close H;             ###Shut down shop.
close OUT;           ###Shut down output file.


END PROGRAM ABOVE (TWO LINES)






PROGRAM STATEMENTS WITH INDENTS TO HIGHLIGHT THE LOGIC: 

#!c:\perl    

open (H, "ae.txt");           
open (OUT, ">wordlist.txt");  

$ln=0;
open (C, "ae.txt");                  #OPEN INPUT FILE 

while ($line = <H>){                    #START WHILE                 

     chomp $line; 

     if ($line =~ /XXX/){            #START IF 
          $ln++;           
          print OUT ("$ln-XXX\n");                             
          next;                         
     }                               #END IF

     my @f = split(/ /,$line);       #SPLIT line

     foreach $f (@f){                #START FOREACH
          print OUT "$f\n";    
   
     }                               #END FOREACH

}                                    #END WHILE

close H; 
close OUT;

NOTE 1: WHILE picks up one line at a time "while" there are lines in the INPUT file. It stops after the last one.

NOTE 2: IF tests the line if it is a cusp XXX or if it is a word. If it is a cusp, it performs the increment and prints out the cusp with the sentence number. The sentence number appears AFTER the words in the sentence.

NOTE 3: SPLIT takes a line of the text of the Areopagitica and splits it into a "stack" of words, literally a stack. It is called an ARRAY; the name of the array is "@f", could be anything with an "@" in front..

NOTE 4: FOREACH processes each item in the stack (above) one at a time. In this case it just adds it to the bottom of the OUTPUT file with a line feed, "\n".








OUTPUT: List of words from the Areopagitica, sentence one to three, sentence cusps marked and counted. (Sample only shown, [. . .] indicate gap, program produces complete list, 18,000 lines on 18,000 lines in 357 sentences.) This is considered a tiny dataset. A small first step.



FILENAME: wordlist.txt



START OUTPUT PROGRAM ONE

They
who
to
States
and
Governours
of
the
Commonwealth
.
.
.
beginning
of
no
meane
endeavour,
not
a
little
alter'd
and
mov'd
inwardly
in
their
mindes:
Some
.
.
.
others
with
confidence
of
what
they
have
to
speake.
1-XXX
And
me
perhaps
each
of
these
dispositions,
as
.
.
.
power
within
me
to
a
passion,
farre
more
welcome
then
incidentall
to
a
Preface.
2-XXX
Which
though
.
.
.
be
a
certaine
testimony,
if
not
a
Trophey.
3-XXX
.
.
.
[ed: last sentence below]
But
of
these
Sophisms
and
Elenchs
of
marchandize
I
skill
not:
This
I
know,
that
.
.
.
sooner,
if
liberty
of
Printing
be
reduc't
into
the
power
of
a
few;
but
to
redresse
willingly
.
.
.
whereof
none
can
participat
but
greatest
and
wisest
men.
357-XXX



END OF OUTPUT PROGRAM ONE








2. Program TWO - counting the words



INPUT: List of words from the Areopagitica, sentence one to three, sentence cusps marked and counted.



FILENAME: wordlist.txt (SAMPLE ONLY, shortened for this text . . .)



START INPUT

They

who

to

States

and

Governours

of

the

Commonwealth

direct

their

Speech,

High

Court

of

Parlament,

or

.

.

.

confidence

of

what

they

have

to

speake.

1-XXX

And

me

perhaps

each

of

these

dispositions,

as

the

subject

was

whereon

I

enter'd,

.

.

.

welcome

then

incidentall

to

a

Preface.

2-XXX

Which

though

I

stay

not

to

confesse

.

.

.

will

be

a

certaine

testimony,

if

not

a

Trophey.

3-XXX

END OF INPUT






START PROGRAM BELOW

#!c:\perl

open (H, "wordlist.txt");
open (OUT, ">wordnum.txt");

$n=1;

while ($line = <H>){
if ($line =~ /XXX/){       ###START IF - If it is a sentence 
                           ###cusp, do this below.

foreach $senlist (@senlist){        ###Start processing senlist array.
                                    ###Start by splitting the stack 
         ###@senlist into individual words.
         ###We are pacticing manipulatin arrays.
print OUT ("$n","-","$senlist");    ###Put out the word and its position,
$n++;                               ###increment the word counter,
$m++;                               ###increment the sentence counter, and
}         ###drop out of sentence cusp "if".
                                    ###Do not process the "else";
         ###we are practicing "if - else".
}else{                    ###Else if it is a word - do this below
push (@senlist, $line);   ###puch each word of a sentence on a stack
next;                     ###get the next word
}                         ###END OF IF 
print OUT ("$m","-","$line"); ###OUT: number of words in sen, dash and marker 
                              ###XXX. Line already has the number of sen.
                              ###You will get here only if you drop through
                              ###the IF. The ELSE will take you up to 
         ###get the next line. That is where the test
         ###"cusp or word" takes place, there are no
         ###words "XXX". If you test positive for XXX in
         ###IF, you process the array @senlist, i.e. 
         ###you count the words and when finished with  
         ###the sentence you end up here. All a bit 
         ###cryptic, I know, but it works.
$m=0;                         ###Reset the word coumter.
undef @senlist;               ###Reset the array for the words of the next sen.
}                             ###END OF WHILE LOOP - get next line if there
                              ###is one, else, all done close up shop.
close OUT;
close H;

END PROGRAM ABOVE






PROGRAM STATEMENTS WITH INDENTS TO HIGHLIGHT THE LOGIC: 

#!c:\perl

open (H, "wordlist.txt");
open (OUT, ">wordnum.txt");

$n=1;

while ($line = <H>){           START WHILE

if ($line =~ /XXX/){           START IF

foreach $senlist (@senlist){      START FOREACH
print OUT ("$n","-","$senlist");   
$n++;                              
$m++;                             
}                                 END FOREACH
                                  
}else{                         END IF    START ELSE
push (@senlist, $line);   
next;                     
}                                        END ELSE           

print OUT ("$m","-","$line"); 
$m=0;                         
undef @senlist;               

}                             END OF WHILE LOOP                               
close OUT;
close H;

NOTE 1: WHILE picks up a line from the file "wordslist" and stops after the last cusp XXX.

NOTE 2: IF tests whether the line is a cusp or if it is a word.

NOTE 3: FOREACH starts IF the line is a cusp XXX. That means take the array "@senlist" and print each item in the stack (first in, first out) with the word counter "$n" in front viz. 85-1-XXX.
NOTE 4: ELSE (i.e. NOT IF) push the current line (from WHILE) onto the array (stack) "@senlist).

The idea is to hols the sentence stacked in the array and at the next cusp, pop them out one at a time (first in first out, second in second out) and go get the next line from WHILE.




OUTPUT: The first 100 words - sentence cusp marked and counted at position 85, for example.



FILENAME: wordnum.txt



START OUTPUT

1-They

2-who

3-to

4-States

5-and

6-Governours

7-of

8-the

9-Commonwealth

10-direct

11-their

12-Speech,

13-High

14-Court

15-of

16-Parlament,

17-or

18-wanting

19-such

20-accesse

21-in

22-a

23-private

24-condition,

25-write

26-that

27-which

28-they

29-foresee

30-may

31-advance

32-the

33-publick

34-good;

35-I

36-suppose

37-them

38-as

39-at

40-the

41-beginning

42-of

43-no

44-meane

45-endeavour,

46-not

47-a

48-little

49-alter'd

50-and

51-mov'd

52-inwardly

53-in

54-their

55-mindes:

56-Some

57-with

58-doubt

59-of

60-what

61-will

62-be

63-the

64-successe,

65-others

66-with

67-fear

68-of

69-what

70-will

71-be

72-the

73-censure;

74-some

75-with

76-hope,

77-others

78-with

79-confidence

80-of

81-what

82-they

83-have

84-to

85-speake.

85-1-XXX

86-And

87-me

88-perhaps

89-each

90-of

91-these

92-dispositions,

93-as

94-the

95-subject

96-was

97-whereon

98-I

99-enter'd,

100-may (file continues to 18,000)

END OUTPUT










3. Program THREE. Extracting the words with bi-labials.



INPUT: The first 100 words - sentence cusp marked and counted at position 85.



FILENAME: wordnum.txt (SAMPLE ONLY)



START INPUT

1-They

2-who

3-to

4-States

5-and

6-Governours

7-of

.

.

.

53-in

54-their

55-mindes:

56-Some

57-with

58-doubt

59-of

60-what

61-will

62-be

63-the

64-successe,

65-others

66-with

67-fear

68-of

.

.

.

79-confidence

80-of

81-what

82-they

83-have

84-to

85-speake.

85-1-XXX

86-And

87-me

88-perhaps

89-each

90-of

91-these

92-dispositions,

93-as

94-the

95-subject

96-was

97-whereon

98-I

99-enter'd,

100-may

END INPUT






START PROGRAM BELOW

#!c:\perl

open (H, "wordnum.txt");      ###Queue up list of words and their position.
open (OUT, ">pbnum.txt");     ###Queue empty file for the bi-labials.
$i=0;                         ###Start bi-labial counter at 0/
while ($line = <H>){          ###Start reading file with words and positions.
                              ###Get a word or a cusp marker.
if ($line =~ /XXX/) {            ###Check if there is a line
                                 ###or a sentence cusp?
print OUT ("$i","-","$line");    ###IF cusp, OUT PUT bi-lab count and
                                 ###the line with the cusp marker which
         ###also carries the number of words in the
         ###sentence and the sen. number, viz. ###85-1-XXX becomes 11-85-1-XXX.
         ###READ: 11 bi-labs, 85 words, sen 1.
$i=0;                            ###Reset bi-lab counter for next sen.
next;                            ###Get next line, word or cusp.
}
if ($line =~ /b/) {              ###The next statement test for the various
                                 ###forms of bi-labials. The format is the
         ###same for each: 1. find it, 2. increment
         ###counter print out line with word 
         ###containing bi-lab and its position
         ###in text. Get next line.         
$i++;                 
print OUT ("$line");
next;
}
if ($line =~ /p/) {
$i++;
print OUT ("$line");
next;
}
if ($line =~ /B/) {
$i++;
print OUT ("$line");
next;
}
if ($line =~ /P/) {
$i++;
print OUT ("$line");
next;
}
}                                ###END of WHILE LOOP get next line
                                 ###if the previous word did not have
         ###a bi-lab. Had it had a bi-lab, a
         ###"next" in one of the IF's would
         ###have gotten the next line.
         ###Not that weird - algorithmic.
close OUT;
close H;

END PROGRAM ABOVE






OUTPUT: Words with bp are extracted and formatted with their position in the text. Sentence cusp marked: number of pb words, total words in sentence, sentence number, cusp marker XXX.



FILENAME: pbnum.txt



START INPUT

12-Speech,

16-Parlament,

23-private

33-publick

36-suppose

41-beginning

58-doubt

62-be

71-be

76-hope,

85-speake.

11-85-1-XXX

88-perhaps

92-dispositions,

95-subject

113-expressions

122-but

126-attempt

144-power

149-passion,

157-Preface.

9-72-2-XXX

170-be

171-blamelesse,

174-be

184-brings

190-promote

193-liberty;

198-propos'd

200-be

207-Trophey.

9-50-3-XXX

END OUTPUT






4. Program FOUR - Formatting the Bi-labials for further study



INPUT: words containing b or p, with numerical position in the text. Sentence cusps with number of pb words, total number of words, sentence number, cusp marker XXX.



FILENAME: pbnum.txt (SAMPLE ONLY)



START INPUT

12-Speech,

16-Parlament,

23-private

33-publick

36-suppose

41-beginning

58-doubt

62-be

71-be

76-hope,

85-speake.

11-85-1-XXX

88-perhaps

92-dispositions,

95-subject

113-expressions

122-but

126-attempt

144-power

149-passion,

157-Preface.

9-72-2-XXX

170-be

171-blamelesse,

174-be

184-brings

190-promote

193-liberty;

198-propos'd

200-be

207-Trophey.

9-50-3-XXX

213-liberty

217-hope,

234-expect;

235-but

237-complaints

241-deeply

244-speedily

250-bound

253-liberty

9-52-4-XXX

END INPUT






START PROGRAM

#!c:\perl

open (G, "pbnum.txt");
open (OUT, ">pbnum2.txt");

my $flag = "one";

while ($line = <G>){

if ($line =~ /XXX/) {
print OUT ("$line\n");
next;
}

my @numb = split(/ /, $line);

$hold=$numb[0];
$word=$numb[1];

if ($flag eq "one"){
$flag = "two";
$diff =($hold - 1);
$prev = $hold;
} else {
$diff=$hold-$prev;
$prev=$hold;
}

print OUT ("diff ");
print OUT ("$diff ");
print OUT ("$line");
}
close OUT;
close G;

END






OUTPUT: words with bp, difference to previous pb word, words containing b or p, with numerical position in the text. Sentence cusp marked as before in INPUT.



FILENAME: pbnum2



START OUTPUT

diff 11 12,Speech,

diff 4 16,Parlament,

diff 7 23,private

diff 10 33,publick

diff 3 36,suppose

diff 5 41,beginning

diff 17 58,doubt

diff 4 62,be

diff 9 71,be

diff 5 76,hope,

diff 9 85,speake.

11,85,1,XXX

diff 3 88,perhaps

diff 4 92,dispositions,

diff 3 95,subject

diff 18 113,expressions

diff 9 122,but

diff 4 126,attempt

diff 18 144,power

diff 5 149,passion,

diff 8 157,Preface.

9,72,2,XXX

diff 13 170,be

diff 1 171,blamelesse,

diff 3 174,be

diff 10 184,brings

diff 6 190,promote

diff 3 193,liberty;

diff 5 198,propos'd

diff 2 200,be

diff 7 207,Trophey.

9,50,3,XXX

END OUTPUT










PROGRAM 5: Calculate ratio of PB words for each sentence.



INPUT: The sentence cusps hold the total words in each sentence and the pb words. We are practicing programming.



FILENAME: pbnum2.txt (SAMPLE ONLY)



START INPUT

diff 11 12-Speech,

diff 4 16-Parlament,

diff 7 23-private

diff 10 33-publick

diff 3 36-suppose

diff 5 41-beginning

diff 17 58-doubt

diff 4 62-be

diff 9 71-be

diff 5 76-hope,

diff 9 85-speake.

11-85-1-XXX

diff 3 88-perhaps

diff 4 92-dispositions,

diff 3 95-subject

diff 18 113-expressions

diff 9 122-but

diff 4 126-attempt

diff 18 144-power

diff 5 149-passion,

diff 8 157-Preface.

9-72-2-XXX

diff 13 170-be

diff 1 171-blamelesse,

diff 3 174-be

diff 10 184-brings

diff 6 190-promote

diff 3 193-liberty;

diff 5 198-propos'd

diff 2 200-be

diff 7 207-Trophey.

9-50-3-XXX

END INPUT






START PROGRAM BELOW

#!c:\perl

open (W, "pbnum2.txt");
open (OUT, ">pbstat2.txt");


while ($line = <W>){
if ($line =~ /XXX/) {
my @numb = split(/-/,$line);
$pb=$numb[0];
$stot=$numb[1];
$sennum=$numb[2];
if ($pb eq 0){
print OUT ("ZERO, $stot, $sennum\n");
next;
} else {
$percent=($pb/$stot);
printf OUT ("%3.2f",$percent);
print OUT (", $pb, $stot, $sennum\n");
}
}
}
close OUT;
close G;

END PROGRAM






OUTPUT: Ratio pb/total, pb words, total words in sentence, sentence number.



FILENAME: pbstat2



START OUTPUT

0.13, 11, 85, 1

0.13, 9, 72, 2

0.18, 9, 50, 3

0.17, 9, 52, 4

0.10, 8, 78, 5

0.14, 10, 69, 6

0.15, 19, 123, 7

0.17, 7, 42, 8

0.16, 15, 94, 9

0.14, 11, 78, 10

0.10, 8, 78, 11

0.10, 5, 51, 12

0.08, 4, 48, 13

0.19, 6, 31, 14

0.14, 18, 129, 15

0.19, 19, 99, 16

0.26, 11, 42, 17

0.18, 16, 88, 18

0.22, 11, 49, 19

0.15, 13, 89, 20

0.21, 6, 28, 21

0.09, 5, 53, 22

0.31, 10, 32, 23

0.03, 1, 39, 24

0.13, 12, 89, 25

END OUTPUT








PROGRAM 6: Sort the percentages in descending order. Also select sentences over 14%.



INPUT: File with ratios or percentages of pb words.



FILENAME: pbstat2.txt (sample only)



START INPUT

0.13, 11, 85, 1

0.13, 9, 72, 2

0.18, 9, 50, 3

0.17, 9, 52, 4

0.10, 8, 78, 5

0.14, 10, 69, 6

0.15, 19, 123, 7

0.17, 7, 42, 8

0.16, 15, 94, 9

0.14, 11, 78, 10

0.10, 8, 78, 11

0.10, 5, 51, 12

0.08, 4, 48, 13

0.19, 6, 31, 14

0.14, 18, 129, 15

0.19, 19, 99, 16

0.26, 11, 42, 17

0.18, 16, 88, 18

0.22, 11, 49, 19

0.15, 13, 89, 20

0.21, 6, 28, 21

0.09, 5, 53, 22

0.31, 10, 32, 23

0.03, 1, 39, 24

0.13, 12, 89, 25

END INPUT






START PROGRAM

#!c:\perl

open (W, "pbstat2.txt");
open (OUT, ">pbstat3.txt");
open (SM, ">pbstat3sm.txt");

while ($line = <W>){

if ($line =~ /Z/){
next;
}
push (@pblist, $line);
my @g = split(/,/, $line);
if ($g[0] gt .14){
push (@pblistsm, $line);
}
}
@sortpb = sort {$b <=> $a} @pblist;
@sortpbsm = sort {$b <=> $a} @pblistsm;


print OUT (" ");
print OUT "@sortpb";
print SM "@pblistsm";
close OUT;
close SM;
close W;

END






OUTPUT 1: Sorted Percentages.



FILENAME: pbstat3.txt



START OUTPUT 1

0.31, 15, 49, 38

0.31, 10, 32, 23

0.31, 4, 13, 105

0.30, 7, 23, 94

0.29, 6, 21, 108

0.28, 7, 25, 56

0.28, 13, 47, 228

0.27, 10, 37, 117

0.27, 6, 22, 168

0.26, 11, 42, 17

0.26, 18, 70, 349

0.26, 9, 35, 171

0.25, 7, 28, 54

0.25, 9, 36, 131

0.25, 3, 12, 146

0.25, 7, 28, 46

0.25, 2, 8, 321

0.24, 8, 34, 212

0.24, 6, 25, 58

0.24, 4, 17, 291

0.24, 8, 34, 356

0.24, 14, 58, 254

0.24, 8, 34, 172

0.23, 10, 44, 151

0.23, 5, 22, 154

END OUTPUT 1



OUTPUT 2: Percentages over 13.



FILENAME:  pbstatsm.txt



START OUTPUT 2

0.18, 9, 50, 3

0.17, 9, 52, 4

0.15, 19, 123, 7

0.17, 7, 42, 8

0.16, 15, 94, 9

0.19, 6, 31, 14

0.19, 19, 99, 16

0.26, 11, 42, 17

0.18, 16, 88, 18

0.22, 11, 49, 19

0.15, 13, 89, 20

0.21, 6, 28, 21

0.31, 10, 32, 23

0.19, 13, 67, 26

0.15, 6, 39, 27

.

.

.

0.23, 8, 35, 206

0.17, 5, 29, 208

0.18, 5, 28, 209

0.24, 8, 34, 212

0.17, 14, 80, 213

0.15, 5, 33, 217

0.15, 6, 40, 218

0.22, 15, 67, 222

0.17, 8, 46, 223

0.22, 11, 51, 224

0.18, 6, 33, 226

0.15, 8, 55, 227

0.28, 13, 47, 228

0.16, 14, 87, 229

0.15, 13, 88, 230

0.17, 6, 36, 231

0.18, 5, 28, 233

0.19, 8, 43, 234

0.21, 6, 28, 236

0.15, 6, 40, 237

0.21, 4, 19, 239

0.18, 14, 76, 243

0.15, 13, 87, 244

0.15, 10, 68, 250

0.19, 18, 95, 251

0.24, 14, 58, 254

0.21, 21, 99, 257

0.15, 14, 95, 258

0.17, 7, 42, 260

0.21, 7, 33, 262

0.17, 8, 46, 264

0.16, 12, 76, 266

0.19, 9, 48, 272

0.15, 4, 27, 273

0.15, 5, 33, 276

0.22, 13, 58, 277

0.17, 5, 29, 282

0.15, 13, 88, 288

0.24, 4, 17, 291

END OUTPUT 2