Friday, March 9, 2012

APPENDIX C: The perl Programs

APPENDIX C: The perl Programs

A (belated) NOTE: My plans have not worked out for this programming project and I just found out that I had some issues with the perl code displayed in this post. I have rethought the problem and posted the "revised" code with further explanations in a new series of posts. However, at present, textareas are not a happy solution  Naively, I thought a "code" tag would take care of everything. That is how I would have done it. It does some things but cannot deal with the angle brackets of a while loop. All robots should keep their skinny fingers off the text inside the "code" tag. Who did away with that anyway?Alas. Anyway, my apologies to all twelve persons who clicked this post. The problem is fixed, but it was a lot of trouble. ANYWAY, everything about last years blogs still stands, I have learned much about Milton. And will continue to post. cheers, PB

NOTE: A future blog will present some discussion of the programs. In this blog it is important to appreciate the applications of "programming" to one of the classics of the English language, the Areopagitica. The format for the presentation of the six perl programs will be:

a. INPUT, FILENAME, INPUT DATA SAMPLE,
b. PROGRAM STATEMENTS, PROGRAM LOGIC,
c. OUTPUT, FILENAME, OUTPUT DATA SAMPLE.



1. Program ONE - Creating a list of words one word per line. Counting sentences.

INPUT:  The text of the Areopagitica. Minor pre-processing was done to mark sentence cusps with XXX. This is done as a convenience and aid for modern readers, e.g. me, since the 17c. prose style requires intense parsing of individual sentences. (Sample only, the actual program parses all sentences, ca. 300; sample contains: SENTENCES 1 to 3 ... and LAST SENTENCE.)

FILENAME: ae.txt

START INPUT FOR PROGRAM ONE
They who to States and Governours of the Commonwealth direct their Speech, High Court of Parlament, or wanting such accesse in a private condition, write that which they foresee may advance the publick good; I suppose them as at the beginning of no meane endeavour, not a little alter'd and mov'd inwardly in their mindes: Some with doubt of what will be the successe, others with fear of what will be the censure; some with hope, others with confidence of what they have to speake.
XXX
And me perhaps each of these dispositions, as the subject was whereon I enter'd, may have at other times variously affected; and likely might in these formost expressions now also disclose which of them sway'd most, but that the very attempt of this addresse thus made, and the thought of whom it hath recourse to, hath got the power within me to a passion, farre more welcome then incidentall to a Preface.
XXX
Which though I stay not to confesse ere any aske, I shall be blamelesse, if it be no other, then the joy and gratulation which it brings to all who wish and promote their Countries liberty; whereof this whole Discourse propos'd will be a certaine testimony, if not a Trophey.
XXX
.
.
.
But of these Sophisms and Elenchs of marchandize I skill not: This I know, that errors in a good government and in a bad are equally almost incident; for what Magistrate may not be mis-inform'd, and much the sooner, if liberty of Printing be reduc't into the power of a few; but to redresse willingly and speedily what hath bin err'd, and in highest autority to esteem a plain advertisement more then others have done a sumptuous bribe, is a vertue (honour'd Lords and Commons) answerable to Your highest actions, and whereof none can participat but greatest and wisest men.
XXX
END OF INPUT FOR PROGRAM ONE



START PROGRAM BELOW   "###" Indicates comments on the statements on left.
END PROGRAM ABOVE (TWO LINES)




PROGRAM STATEMENTS WITH INDENTS TO HIGHLIGHT THE LOGIC:
NOTE 1: WHILE picks up one line at a time "while" there are lines in the INPUT file. It stops after the last one.
NOTE 2: IF tests the line if it is a cusp XXX or if it is a word. If it is a cusp, it performs the increment and prints out the cusp with the sentence number. The sentence number appears AFTER the words in the sentence.
NOTE 3: SPLIT takes a line of the text of the Areopagitica and splits it into a "stack" of words, literally a stack. It is called an ARRAY; the name of the array is "@f", could be anything with an "@" in front..
NOTE 4: FOREACH processes each item in the stack (above) one at a time. In this case it just adds it to the bottom of the OUTPUT file with a line feed, "\n".




OUTPUT: List of words from the Areopagitica, sentence one to three, sentence cusps marked and counted. (Sample only shown, [. . .] indicate gap, program produces complete list, 18,000 lines on 18,000 lines in 357 sentences.) This is considered a tiny dataset. A small first step.

FILENAME: wordlist.txt

START OUTPUT PROGRAM ONE
They
who
to
States
and
Governours
of
the
Commonwealth
.
.
.
beginning
of
no
meane
endeavour,
not
a
little
alter'd
and
mov'd
inwardly
in
their
mindes:
Some
.
.
.
others
with
confidence
of
what
they
have
to
speake.
1-XXX
And
me
perhaps
each
of
these
dispositions,
as
.
.
.
power
within
me
to
a
passion,
farre
more
welcome
then
incidentall
to
a
Preface.
2-XXX
Which
though
.
.
.
be
a
certaine
testimony,
if
not
a
Trophey.
3-XXX
.
.
.
[ed: last sentence below]
But
of
these
Sophisms
and
Elenchs
of
marchandize
I
skill
not:
This
I
know,
that
.
.
.
sooner,
if
liberty
of
Printing
be
reduc't
into
the
power
of
a
few;
but
to
redresse
willingly
.
.
.
whereof
none
can
participat
but
greatest
and
wisest
men.
357-XXX


END OF OUTPUT PROGRAM ONE




2. Program TWO - counting the words

INPUT: List of words from the Areopagitica, sentence one to three, sentence cusps marked and counted.

FILENAME: wordlist.txt (SAMPLE ONLY, shortened for this text . . .)

START INPUT
They
who
to
States
and
Governours
of
the
Commonwealth
direct
their
Speech,
High
Court
of
Parlament,
or
.
.
.
confidence
of
what
they
have
to
speake.
1-XXX
And
me
perhaps
each
of
these
dispositions,
as
the
subject
was
whereon
I
enter'd,
.
.
.
welcome
then
incidentall
to
a
Preface.
2-XXX
Which
though
I
stay
not
to
confesse
.
.
.
will
be
a
certaine
testimony,
if
not
a
Trophey.
3-XXX
END OF INPUT



START PROGRAM BELOW
END PROGRAM ABOVE



PROGRAM STATEMENTS WITH INDENTS TO HIGHLIGHT THE LOGIC:
NOTE 1: WHILE picks up a line from the file "wordslist" and stops after the last cusp XXX.
NOTE 2: IF tests whether the line is a cusp or if it is a word.
NOTE 3: FOREACH starts IF the line is a cusp XXX. That means take the array "@senlist" and print each item in the stack (first in, first out) with the word counter "$n" in front viz. 85-1-XXX.
NOTE 4: ELSE (i.e. NOT IF) push the current line (from WHILE) onto the array (stack) "@senlist).
The idea is to hols the sentence stacked in the array and at the next cusp, pop them out one at a time (first in first out, second in second out) and go get the next line from WHILE.


OUTPUT: The first 100 words - sentence cusp marked and counted at position 85, for example.

FILENAME: wordnum.txt

START OUTPUT
1-They
2-who
3-to
4-States
5-and
6-Governours
7-of
8-the
9-Commonwealth
10-direct
11-their
12-Speech,
13-High
14-Court
15-of
16-Parlament,
17-or
18-wanting
19-such
20-accesse
21-in
22-a
23-private
24-condition,
25-write
26-that
27-which
28-they
29-foresee
30-may
31-advance
32-the
33-publick
34-good;
35-I
36-suppose
37-them
38-as
39-at
40-the
41-beginning
42-of
43-no
44-meane
45-endeavour,
46-not
47-a
48-little
49-alter'd
50-and
51-mov'd
52-inwardly
53-in
54-their
55-mindes:
56-Some
57-with
58-doubt
59-of
60-what
61-will
62-be
63-the
64-successe,
65-others
66-with
67-fear
68-of
69-what
70-will
71-be
72-the
73-censure;
74-some
75-with
76-hope,
77-others
78-with
79-confidence
80-of
81-what
82-they
83-have
84-to
85-speake.
85-1-XXX
86-And
87-me
88-perhaps
89-each
90-of
91-these
92-dispositions,
93-as
94-the
95-subject
96-was
97-whereon
98-I
99-enter'd,
100-may (file continues to 18,000)
END OUTPUT





3. Program THREE. Extracting the words with bi-labials.

INPUT: The first 100 words - sentence cusp marked and counted at position 85.

FILENAME: wordnum.txt (SAMPLE ONLY)

START INPUT
1-They
2-who
3-to
4-States
5-and
6-Governours
7-of
.
.
.
53-in
54-their
55-mindes:
56-Some
57-with
58-doubt
59-of
60-what
61-will
62-be
63-the
64-successe,
65-others
66-with
67-fear
68-of
.
.
.
79-confidence
80-of
81-what
82-they
83-have
84-to
85-speake.
85-1-XXX
86-And
87-me
88-perhaps
89-each
90-of
91-these
92-dispositions,
93-as
94-the
95-subject
96-was
97-whereon
98-I
99-enter'd,
100-may
END INPUT



START PROGRAM BELOW
END PROGRAM ABOVE



OUTPUT: Words with bp are extracted and formatted with their position in the text. Sentence cusp marked: number of pb words, total words in sentence, sentence number, cusp marker XXX.

FILENAME: pbnum.txt

START INPUT
12-Speech,
16-Parlament,
23-private
33-publick
36-suppose
41-beginning
58-doubt
62-be
71-be
76-hope,
85-speake.
11-85-1-XXX
88-perhaps
92-dispositions,
95-subject
113-expressions
122-but
126-attempt
144-power
149-passion,
157-Preface.
9-72-2-XXX
170-be
171-blamelesse,
174-be
184-brings
190-promote
193-liberty;
198-propos'd
200-be
207-Trophey.
9-50-3-XXX
END OUTPUT



4. Program FOUR - Formatting the Bi-labials for further study

INPUT: words containing b or p, with numerical position in the text. Sentence cusps with number of pb words, total number of words, sentence number, cusp marker XXX.

FILENAME: pbnum.txt (SAMPLE ONLY)

START INPUT
12-Speech,
16-Parlament,
23-private
33-publick
36-suppose
41-beginning
58-doubt
62-be
71-be
76-hope,
85-speake.
11-85-1-XXX
88-perhaps
92-dispositions,
95-subject
113-expressions
122-but
126-attempt
144-power
149-passion,
157-Preface.
9-72-2-XXX
170-be
171-blamelesse,
174-be
184-brings
190-promote
193-liberty;
198-propos'd
200-be
207-Trophey.
9-50-3-XXX
213-liberty
217-hope,
234-expect;
235-but
237-complaints
241-deeply
244-speedily
250-bound
253-liberty
9-52-4-XXX
END INPUT



START PROGRAM
END



OUTPUT: words with bp, difference to previous pb word, words containing b or p, with numerical position in the text. Sentence cusp marked as before in INPUT.

FILENAME: pbnum2

START OUTPUT
diff 11 12,Speech,
diff 4 16,Parlament,
diff 7 23,private
diff 10 33,publick
diff 3 36,suppose
diff 5 41,beginning
diff 17 58,doubt
diff 4 62,be
diff 9 71,be
diff 5 76,hope,
diff 9 85,speake.
11,85,1,XXX
diff 3 88,perhaps
diff 4 92,dispositions,
diff 3 95,subject
diff 18 113,expressions
diff 9 122,but
diff 4 126,attempt
diff 18 144,power
diff 5 149,passion,
diff 8 157,Preface.
9,72,2,XXX
diff 13 170,be
diff 1 171,blamelesse,
diff 3 174,be
diff 10 184,brings
diff 6 190,promote
diff 3 193,liberty;
diff 5 198,propos'd
diff 2 200,be
diff 7 207,Trophey.
9,50,3,XXX
END OUTPUT





PROGRAM 5: Calculate ratio of PB words for each sentence.

INPUT: The sentence cusps hold the total words in each sentence and the pb words. We are practicing programming.

FILENAME: pbnum2.txt (SAMPLE ONLY)

START INPUT
diff 11 12-Speech,
diff 4 16-Parlament,
diff 7 23-private
diff 10 33-publick
diff 3 36-suppose
diff 5 41-beginning
diff 17 58-doubt
diff 4 62-be
diff 9 71-be
diff 5 76-hope,
diff 9 85-speake.
11-85-1-XXX
diff 3 88-perhaps
diff 4 92-dispositions,
diff 3 95-subject
diff 18 113-expressions
diff 9 122-but
diff 4 126-attempt
diff 18 144-power
diff 5 149-passion,
diff 8 157-Preface.
9-72-2-XXX
diff 13 170-be
diff 1 171-blamelesse,
diff 3 174-be
diff 10 184-brings
diff 6 190-promote
diff 3 193-liberty;
diff 5 198-propos'd
diff 2 200-be
diff 7 207-Trophey.
9-50-3-XXX
END INPUT



START PROGRAM BELOW
END PROGRAM



OUTPUT: Ratio pb/total, pb words, total words in sentence, sentence number.

FILENAME: pbstat2

START OUTPUT
0.13, 11, 85, 1
0.13, 9, 72, 2
0.18, 9, 50, 3
0.17, 9, 52, 4
0.10, 8, 78, 5
0.14, 10, 69, 6
0.15, 19, 123, 7
0.17, 7, 42, 8
0.16, 15, 94, 9
0.14, 11, 78, 10
0.10, 8, 78, 11
0.10, 5, 51, 12
0.08, 4, 48, 13
0.19, 6, 31, 14
0.14, 18, 129, 15
0.19, 19, 99, 16
0.26, 11, 42, 17
0.18, 16, 88, 18
0.22, 11, 49, 19
0.15, 13, 89, 20
0.21, 6, 28, 21
0.09, 5, 53, 22
0.31, 10, 32, 23
0.03, 1, 39, 24
0.13, 12, 89, 25
END OUTPUT




PROGRAM 6: Sort the percentages in descending order. Also select sentences over 14%.

INPUT: File with ratios or percentages of pb words.

FILENAME: pbstat2.txt (sample only)

START INPUT
0.13, 11, 85, 1
0.13, 9, 72, 2
0.18, 9, 50, 3
0.17, 9, 52, 4
0.10, 8, 78, 5
0.14, 10, 69, 6
0.15, 19, 123, 7
0.17, 7, 42, 8
0.16, 15, 94, 9
0.14, 11, 78, 10
0.10, 8, 78, 11
0.10, 5, 51, 12
0.08, 4, 48, 13
0.19, 6, 31, 14
0.14, 18, 129, 15
0.19, 19, 99, 16
0.26, 11, 42, 17
0.18, 16, 88, 18
0.22, 11, 49, 19
0.15, 13, 89, 20
0.21, 6, 28, 21
0.09, 5, 53, 22
0.31, 10, 32, 23
0.03, 1, 39, 24
0.13, 12, 89, 25
END INPUT



START PROGRAM
END



OUTPUT 1: Sorted Percentages.

FILENAME: pbstat3.txt

START OUTPUT 1
0.31, 15, 49, 38
0.31, 10, 32, 23
0.31, 4, 13, 105
0.30, 7, 23, 94
0.29, 6, 21, 108
0.28, 7, 25, 56
0.28, 13, 47, 228
0.27, 10, 37, 117
0.27, 6, 22, 168
0.26, 11, 42, 17
0.26, 18, 70, 349
0.26, 9, 35, 171
0.25, 7, 28, 54
0.25, 9, 36, 131
0.25, 3, 12, 146
0.25, 7, 28, 46
0.25, 2, 8, 321
0.24, 8, 34, 212
0.24, 6, 25, 58
0.24, 4, 17, 291
0.24, 8, 34, 356
0.24, 14, 58, 254
0.24, 8, 34, 172
0.23, 10, 44, 151
0.23, 5, 22, 154
END OUTPUT 1

OUTPUT 2: Percentages over 13.

FILENAME:  pbstatsm.txt

START OUTPUT 2
0.18, 9, 50, 3
0.17, 9, 52, 4
0.15, 19, 123, 7
0.17, 7, 42, 8
0.16, 15, 94, 9
0.19, 6, 31, 14
0.19, 19, 99, 16
0.26, 11, 42, 17
0.18, 16, 88, 18
0.22, 11, 49, 19
0.15, 13, 89, 20
0.21, 6, 28, 21
0.31, 10, 32, 23
0.19, 13, 67, 26
0.15, 6, 39, 27
.
.
.
0.23, 8, 35, 206
0.17, 5, 29, 208
0.18, 5, 28, 209
0.24, 8, 34, 212
0.17, 14, 80, 213
0.15, 5, 33, 217
0.15, 6, 40, 218
0.22, 15, 67, 222
0.17, 8, 46, 223
0.22, 11, 51, 224
0.18, 6, 33, 226
0.15, 8, 55, 227
0.28, 13, 47, 228
0.16, 14, 87, 229
0.15, 13, 88, 230
0.17, 6, 36, 231
0.18, 5, 28, 233
0.19, 8, 43, 234
0.21, 6, 28, 236
0.15, 6, 40, 237
0.21, 4, 19, 239
0.18, 14, 76, 243
0.15, 13, 87, 244
0.15, 10, 68, 250
0.19, 18, 95, 251
0.24, 14, 58, 254
0.21, 21, 99, 257
0.15, 14, 95, 258
0.17, 7, 42, 260
0.21, 7, 33, 262
0.17, 8, 46, 264
0.16, 12, 76, 266
0.19, 9, 48, 272
0.15, 4, 27, 273
0.15, 5, 33, 276
0.22, 13, 58, 277
0.17, 5, 29, 282
0.15, 13, 88, 288
0.24, 4, 17, 291
END OUTPUT 2

No comments: