Page 44 - Cyberculture and New Media
P. 44
Francisco J. Ricardo 35
______________________________________________________________
and ideas, and based on those research claims, we would expect to find lower
lexical density in oral data than in print, and the density of online texts would
presumably lie between both.
3. Analysis
There is much theory on blogging, but few empirical studies, of the
semantics or stylistic composition in blogs (or emails). And for extant
research, methodological weaknesses pose an additional obstacle. One 2004
study, Herring et al analysed 203 blogs for linguistic measures. Its method
arrived at conclusions based on the reported number of sentences detected
11
(3260) and words collected (42930) . However, this study cannot have
looked at more than the first page of each blog, for in my study of 61 blogs,
the scanning program written for that task requested 30 postings from each of
the 61 sites, for a total of 8726 sentences and 94433 words, many more words
drawn from fewer than one-third the blogs in the 2004 study. In all, the
statistics in my work are based on 522 individual postings. My analysis found
the average number of words per post to be 303, not similar to Herring’s 210.
We did, however agree on the average number of words per sentence; I found
15, Herring 16.
Herring et al count the number of paragraphs in their blog corpus,
but this measure is problematic in the blog genre. A paragraph, in the realm
of conventional print, is a group of one of more sentences separated by one or
more blank lines. However, the definition of paragraphs is different in web
genres, where, rather than being used to separate groups of ideas in the same
text, paragraph breaks instead introduce whole new ideas or micro texts.
Similarly, the paragraph, or a set of empty lines, to be precise, is overloaded
in blog style: its serves as the default marker between blog posts; as the
separator between texts and graphic elements; as a break between a text and
an inserted quote; and as mere cosmetic device where inserting white space
adds visual balance to existing text blocks. None of these uses is functionally
related to notion of a paragraph boundary.
A more difficult problem is that of quoted phrases in blogs.
Herring’s count presents no definition for what constitutes a quoted phrase.
Instead, they provide two separate counts, quoted sentences/fragments and
quoted words per sentence, but these do not specify how quotes were
counted, for, in the high intertextuality of blog style, there are at least three
ways to encapsulate a quoted phrase. One is in the conventional way: by
inserting the desired text within quotes. Another is by means of block text
with indented margins on both sides, for which an HTML tag specifically
exists. The third is not to include the text at all, but rather to link to it. This
makes questionable the statistical measure presented there, the number of
“quoted words per sentence”, which they find to be 7.6–an almost impossible