Tuesday, December 14, 2004

The Role of the Author in Topical Blogs

This paper was also submitted to CHI 05.

Abstract Web logs, or blogs, challenge the notion of authorship. Seemingly, rather than a model in which the author's writings are themselves a contribution, the blog author weaves a tapestry of links, quotations, and references amongst generated content. In this paper, I present a study of the role of the author plays in the construction of topical blogs, in particular focusing on how blog authors make decisions about what to post and how they judge the quality of posts. To this end, I analyzed the blogs and blogging habits of eight participants using a quantitative analysis tool that I developed, a diary study, and interviews with each participant. Results suggest that authors of topical blogs often do not but strive to create new content, often follow journalistic conventions, use the content of their blogs as a reference tool for other work practices, and are connected as a community by a set of source documents. Results also show that Instant Messaging is useful as an interview medium when questions center around online content.

Full text

Saturday, December 11, 2004

Excuse me, where would I get a book

Where was this article when I was writing my final paper?!

Students shun search for information offline

Cites Paul as well...

This Blog's Stats and; Information Flows Surrounding the Election

Two Things:
  • Interesting statistical information about who reads this blog and when is available by clicking the little planet image on the right hand side of the blog. Some highlights:

    • The highest hour and day for traffic was 2-3pm during Wednesdays, right during our class.

    • The distribution of hits per hour of the day follows what I call a "fat-man" or "urn" distribution seen on many other blog stats like SIMS PhD Student danah boyd's blog. (Here's danah's unique stats page.)

    • About 25% of visitors use the Macintosh operating system...

  • My paper is not nearly as cool as all of yours. I examined the information flows surrounding the recent election and found that very few people have what it takes to properly evaluate most of the information flows. Here's my abstract:

    "The general election of 2004 was likely the most highly scrutinized, at al levels, of any election in the history of the United States. The attention the election attracted by the public, academics and activists was particularly heightened due to a number of factors when compared to the usual election-season ramp-up of reporting by large media outlets. A substantial difference from previous elections was the maturation of the “web log” or “blog” – a style of online personal publishing that has attracted a lot of attention in recent years and which essentially allows any person with a personal computer and internet access to easily maintain a vibrant online “newspaper” – and the increase in online political participation. Another substantial difference with the attention paid to this election compared to elections past was the foresight and scrutiny of the academic community who, along with the general public, were largely caught off guard by the fiasco in Florida in 2000 involving the recounting of punchcard ballots. In this paper, I aim to describe and analyze the information dynamic between the public, advocacy and activist organizations, academics, journalists, election officials and voting technology vendors. I find that, amongst all the din, one thing is certain: few people have the ability to reasonably judge the quality of information stemming from these various constituencies."

    If you really want to go through the pain of reading my paper, find it here: "A System That Should Not Be So Broken: The Flow of Information During the 2004 Election"

Thursday, December 09, 2004

Evaluating Information Quality in Mediated Communications: A Case Study of Gay Marriage

I address the following question:
Is the information we are looking at authoritative and how do we
assess it as such?

I looked at 4 different websites suggesting ways to determine
authority online and I compiled my own set of standards, based on my
research into the subject of gay marriage.

I broke it down into four criteria: Accessibility, Timeliness,
Authority, and Objectivity and looked at different media forms,
including books, dictionaries, online news, websites, and blogs.

I was intrigued by this subject for a number of reasons:
-the evolution of the term gay marriage in the dictionary over the
past few years as well as related terms like gaydar, gay pride,
homosexual marriage
-the nature of blogs or personal websites about gay marriages (all
were rooted in some sort of bias, many of which were faith-based,
although, surprisingly, not all Christians and other religious believers
were opponents of gay marriage, some were equally strongly in favor of
-bloggers have a strong say in how media and news sources have
evolved, media has to be careful about what they say!

You can read it here.

Academic Plagiarism

For my final paper, I created a smallish survey to be taken by professors and university instructors, about the incidence of plagiarism they've seen in their students (and about what they'd consider plagiarism). Unfortunately, the turnout was a little under what I was hoping (I had 7 responses), but it was still enough for me to draw some interesting conclusions, and kick off some other interesting lines of thought.

A quick summary of my findings:
  1. 6 out of 7 professors have encountered it
  2. It's typically found out by the instructor recognizing the source; another indicator can be if the instructor feels the work is too polished or sophisticated for the student, but that cue is less reliable and can be misleading
  3. Only 1 of the professors uses an automated plagiarism service like Turnitin, but they are starting to become more popular
    1. Turnitin works by comparing texts against a massive database of documents, including all previous papers scanned against it; nice, but not useful if the paper comes from a "guaranteed non-plagiarized" paper mill
    2. Another option, Glatt Plagiarism Services, uses a cognitive technique where every fifth word is removed from the document, and the purported author has to fill in the missing words; also nifty, but you have to already have a suspicion in order to run this test (i.e, you can't just automatically run it on everyone like you can with Turnitin)
  4. There's a lot of gray area cases as to what professors would consider instances of plagiarism; there's some definitions in the Berkeley student code of conduct, but they're incredibly vague.
  5. There's lots of room for further research.
The survey itself is located here: http://dream.sims.berkeley.edu:8080/jsolomin/survey/index.html

The paper isn't currently in a very web-friendly format, but if I convert it to one, I'll post the link on the blog.

Wednesday, December 08, 2004

Information Filtering Behavior

Here is an abstract for my project:

"Information filtering has been one of the central concerns of information science, especially with regards to information filtering algorithms for information retrieval. However, to date comparatively little research has focused on understanding how people filter information in everyday life through sociocultural and behavioral mechanisms and choices. This paper presents the results of an exploratory qualitative interview study which uncovered a number of sociocultural and behavioral adaptations in relation to information and information technologies for information filtering. Based on these findings, we make the case that further work is necessary to understand specific information filtering behaviors and their connections to existing and emerging technologies. These understandings, we argue, will be necessary for designing appropriate, effective, and sustainable information technologies."

The project itself has several elements, including a paper, and anotated bibliography, and some proposed research documents. Mostly the point was to say that this is an understudied issue, and to present some preliminary data.

The complete project is available here.

Your final project here

Per the discussion in class today, if you're so inclined, please drop a message here with the abstract of you paper and maybe a link to it online. I'm looking forward to see what everyone else was working on!

Content Cliques: Content Quality and Site Restrictions


Quality is an attribute that is generally desirable, no less so in the field of information than anywhere else. However, a general notion of quality that spans all communities and all topics seems problematic due to the specificity of each community and their interests. Yet we find that Google, with its homogenized notion of quality (captured in their notion of pagerank) has become the most widely used tool that information workers turn to when they need to find answers on the Web. How does Google’s generic notion of quality apply to the specific requirements of a very specialized (and difficult) topic, and what could be done to improve Google’s effectiveness?

In this paper, I take a community, Buddhist scholars, and a specific question “What is the relationship between Buddhism and Nihilism?” and examine how the notion of quality information searches may be problematic for Google’s generic algorithms. In addition, I present a tool that uses Google search engine technology to perform standard Google queries, but confine their scope to a small collection of vetted sites (that I refer to as a “content clique”). The results are collected and then sorted according to Google’s pagerank. In the process of creating this tool, potential avenues for creating alternative quality rankings become apparent. Some of these alternatives are briefly discussed and methods of implementation are pointed out.


Lowering Information Quality Expectations

This morning I was talking with a friend in the SIMS lounge, discussing the ideas behind the final paper that I've been writing with Dave. At one point, he stopped me and asked if I could help him solve a quality of information problem. He is one of the contributors to a group re-blogging site, unmediated.org. Apparently many of the visitors to the site are upset when they find out that the contributors do not verify the information posted on the site--they simply find interesting news, ideas or projects and post excerpts of other author's work to their site, with attribution and a link to the original text.

Perhaps unmediated.org looks too professional, or perhaps external content is not clearly labeled--but I read the site, and it appears clear enough to me. It may be that readers trust the contributors to post only 'quality' information, and to filter out 'rotten' information. The issue is not that people cannot tell the difference between the two--if it was, they wouldn't be complaining--it is that the information presented has gained the credibility by being posted on the site.

How do we lower the expectations of readers on such a site? They still want to be able to post information that may be based on rumors, just to let people see what is out there, but do not want to give the impression that everything is of the same 'quality.' Ideas?

Sunday, December 05, 2004

Class project...

I've trimmed down my previous post to discuss only the project for this class (see palojono.blogspot.com for the previous one):

The Data, Information, Knowledge, Wisdom chain: The Metaphorical link

The DIWK 'chain' was first loosely hinted at by T.S. Eliot in "The Rock" The paper discusses the conceptual differences we have in our understandings of the concepts of data, information and knowledge and how one can be transformed into another. Why can we complain of, for example, 'Information overload' but not 'Knowledge overload?' And how exactly does data become information, become knowledge? The anaylsis questioned the existence of the suggested chain at all, but found that, at least conceptually, the chain seems to have a clear logical reason for 'existing.'

The paper