Page 56 - Cyberculture and New Media
P. 56
Francisco J. Ricardo 47
______________________________________________________________
Appendix C – The Enron Mail Corpus
In emails, inserting extraneous text (e.g., news stories from The
Associated Press, Reuters) is common, and these had to be removed so that
the true style of email writing could be examined. The manual distillation
process the elimination of all person references as well as titles (which are
not part of the body of a text). Incidentally, having controlled for spam or
automatically generated titles (e.g., “Breaking News from
ABCNEWS.com”), “RE:”, “FWD:” and repeated entries, the average email
title is 3.56 words in length. 500 random messages from the Enron email
corpus were cleaned, scanned and parsed for style according to the criteria
indicated below.
1. Repeated or extratextual lines were eliminated (those beginning with
“>“);
2. Reports included in emails were eliminated (e.g., “Energy Executive
Daily”);
3. Words containing “@”were eliminated as potential emails;
4. Lines containing email headers (e.g., “From:”, “To:”, “cc:”,
“Subject:”, etc.) were eliminated.
The original extraction was of 99,241 words, 493,144 characters on 17,229
lines, the equivalent of 303 pages of text.
Notes
1
One might suppose the case of outlining software as the clear exception.
This class of software exhibits, after all, the swift and ready capacity for
promoting, demoting and reordering items, from lines to entire paragraphs. It
would thus seem the ideal topic processor were it not that what is moved is
only arranged graphically, rather than semantically. The software executes no
rules for identifying, relating, or maintaining coherence among the topics in
the user’s text.
2
Tufte, E., The Cognitive Style of Powerpoint, Graphics Press, Cheshire,
Connecticut, 2003.
3
Byrne, D., E.E.E.I (Envisioning Emotional Epistemological Information),
Steidl Publishing, Göttingen, Germany, 2003.
4
Janzen-Wilde, L., ‘Oral and Literate Characteristics of Facilitated
Communication’, Facilitated Communication Digest, 1993/2,1993.
5
Ferris, S. P., ‘Writing Electronically: The Effects of Computers on
Traditional Writing’, Journal of Electronic Publishing, 8 2002.