Page 44 - Cyberculture and New Media
P. 44

Francisco J. Ricardo                  35
                             ______________________________________________________________
                             and ideas, and based on those research claims, we would expect to find lower
                             lexical density in oral data than in print, and the density of online texts would
                             presumably lie between both.

                             3.      Analysis
                                     There is much theory on blogging, but few empirical studies, of the
                             semantics  or  stylistic  composition  in  blogs  (or  emails).  And  for  extant
                             research, methodological weaknesses pose an additional obstacle. One 2004
                             study, Herring et al analysed 203 blogs for linguistic measures. Its method
                             arrived  at  conclusions  based  on  the  reported  number  of  sentences  detected
                                                            11
                             (3260)  and  words  collected  (42930) .  However,  this  study  cannot  have
                             looked at more than the first page of each blog, for in my study of 61 blogs,
                             the scanning program written for that task requested 30 postings from each of
                             the 61 sites, for a total of 8726 sentences and 94433 words, many more words
                             drawn  from  fewer  than  one-third  the  blogs  in  the  2004  study.  In  all,  the
                             statistics in my work are based on 522 individual postings. My analysis found
                             the average number of words per post to be 303, not similar to Herring’s 210.
                             We did, however agree on the average number of words per sentence; I found
                             15, Herring 16.
                                     Herring et al count the number of paragraphs in their blog corpus,
                             but this measure is problematic in the blog genre. A paragraph, in the realm
                             of conventional print, is a group of one of more sentences separated by one or
                             more blank lines. However, the definition of paragraphs is different in web
                             genres, where, rather than being used to separate groups of ideas in the same
                             text,  paragraph  breaks  instead  introduce  whole  new  ideas  or  micro  texts.
                             Similarly, the paragraph, or a set of empty lines, to be precise, is overloaded
                             in  blog  style:  its  serves  as  the  default  marker  between  blog  posts;  as  the
                             separator between texts and graphic elements; as a break between a text and
                             an inserted quote; and as mere cosmetic device where inserting white space
                             adds visual balance to existing text blocks. None of these uses is functionally
                             related to notion of a paragraph boundary.
                                     A  more  difficult  problem  is  that  of  quoted  phrases  in  blogs.
                             Herring’s count presents no definition for what constitutes a quoted phrase.
                             Instead,  they  provide  two  separate  counts,  quoted  sentences/fragments  and
                             quoted  words  per  sentence,  but  these  do  not  specify  how  quotes  were
                             counted, for, in the high intertextuality of blog style, there are at least three
                             ways  to  encapsulate  a  quoted  phrase.  One  is  in  the  conventional  way:  by
                             inserting the desired text  within quotes. Another is by means of block text
                             with  indented  margins  on  both  sides,  for  which  an  HTML  tag  specifically
                             exists. The third is not to include the text at all, but rather to link to it. This
                             makes  questionable  the  statistical  measure  presented  there,  the  number  of
                             “quoted words per sentence”, which they find to be 7.6–an almost impossible
   39   40   41   42   43   44   45   46   47   48   49