Showing posts with label internet. Show all posts
Showing posts with label internet. Show all posts

Tuesday, February 3, 2009

More Publicity for my Wikipedia Work

I received an e-mail from a British dude wanting to talk about my Wikipedia work on Friday. I spoke with him on the phone for about 45 minutes that afternoon. That and his other resulted in this article in The Independent. There were two passages which quoted me:

How big is the problem really? Reid Priedhorsky, who studies Wikipedia and similar social projects at the University of Minnesota, estimated in a recent paper that the chances of any one visitor seeing a damaged Wikipedia page are about one in 140, as the average time it takes to repair damage is less than three minutes, and even less for heavily tracked pages. However, there are still more than 100,000 damaged pages at any given time, vandalism appears to be on the increase and it is impossible fully to measure the scale of the problem.

"It's the monster in the closet. You know that it has not grown bigger than the closet and busted down the door, but you don't know exactly how big it is in there," Priedhorsky said. However, the most startling fact about Wikipedia remains how accurate it is, not how inaccurate.

"As a researcher, I'm baffled that it works, but Wikipedia is one of the wonderful things that has happened in the 21st century. Many hands make light work. There are millions of people who edit Wikipedia, and many of them track changes to the pages they are interested in. I have 43 pages on my watchlist, for example, covering subjects I know things about. Any controversial edit is likely to be quickly seen by many people."

The long quote is an amalgam of various things I said during the interview... when I said those things, they weren't all together and didn't read so oddly.

Also, the 100k pages damaged at any one time didn't come from my paper, and is surely wrong. I'm not sure where he got that figure.

Second passage relevant to me:
The foundation's finances are the biggest single threat to Wikipedia, according to Reid Priedhorsky. "A successful community artefact like Wikipedia requires strong buy-in from the community, which I'd wager is much harder to achieve under a for-profit model," he said.
The monster-in-the-closet quote referred to ax-grinding (i.e., the people whose views on Wikipedia are those who have the most time to edit), not vandalism. I liked the description of who I am, which noted that I study Wikipedia but also other things. I don't want to be typecast as someone who just works on Wikipedia.

Thursday, December 11, 2008

Summary

In case you missed the last half-dozen blog posts that I didn't write, here's a summary:
  1. I attended the CSCW conference in San Diego, where I presented my paper, "Computational Geowikis: What, Why, and How", which was nominated for Best Paper and earned an honorable mention. My talk was very well received and cited as the "best talk of the conference" by at least one stranger.
  2. I demoed Cyclopath to a standing-room-only crowd of ~50 city planners and other governmental types. They liked it.
  3. I attended an HCI symposium in New York and gave a talk.
  4. I gave my oral preliminary exam and passed, making me a Ph.D. Candidate. It will be official when all the paperwork that I was supposed to bring to the event is signed. ("Where's the paperwork?" "What paperwork?")
  5. I moved reidster.net to a virtual private server, which is way faster and more reliable. E-mail service will follow soon (not that you'd notice any difference).
  6. I bought a fancy battery charger with all kinds of buttons and modes. It's great.
  7. My FW made incredible toffee, and I ate it.

Tuesday, December 2, 2008

Mozilla Developers Hate Saved HTML Manuals

My website, Cyclopath, is written in Flex, Adobe's development environment which produces apps that run in the Flash Player. Needless to say, I occasionally need to reference the Flex documentation, which is HTML. I can browse it either on Adobe's website (slow) or save their zip file and browse it off the hard disk (fast). Guess which I chose?

Anyway, recent versions of Firefox 3 break these docs, because they don't allow different files in the doc package to reference needed JavaScript (because it's in a higher directory), so all the helpful links like "Show Inherited Public Properties" don't work.

And here's the error message that shows up in the error console:
Error: uncaught exception: [Exception... "Security error" code: "1000" nsresult: "0x805303e8 (NS_ERROR_DOM_SECURITY_ERR)" location: "file:///export/scratch/reid/flex3.2.0/doc/langref/asdoc.js Line: 493"]
...impossible to Google. Bah.

Wandering around the Mozilla bug database a little (using, frankly, a lot of expertise that most people don't have) revealed that, in fact, this is by design.

You can turn off the behavior by going to about:config and setting the secret configuration variable security.fileuri.strict_origin_policy to false. Mozilla developers are not interested in making this more obvious and don't believe that anyone other than a web developer needs to change it, despite the large installed base of on-disk HTML user manuals for a variety of things.

I don't fully understand the security reasons for this change. I assume they're sound. But what an absurdly opaque failure mode.

Shame, Mozilla!

Saturday, October 11, 2008

Backscatter Spam Explosion

Wednesday morning, I woke up to a huge e-mail inbox. Both my inbox and my spam folder were clogged with thousands of unwanted e-mails, and the mail system (I run my own e-mail server) was groaning under the load.

What happened? "Backscatter". Someone had sent off a big load of spam with my e-mail as the return address, so I got all the bounces from the misconfigured servers out there that believed I'd really sent the junk -- 15,000-20,000 of them, I think.

So... I spend the morning cleaning up this garbage. I had to disconnect my mail server from the Internet (to stop the continued flood), and disable my spam detection (SpamAssassin) because that seemed to be a bottleneck.

One of the related problems was that if placing an e-mail in my inbox failed (which may did because the system was so clogged up), that would cause ANOTHER e-mail to be sent to me notifying me of the problem... sigh.

Here's a screenshot of Thunderbird in the middle of the mess. I had already sorted through maybe half of the unwanted mails.



Anyway... bottom line, it was a crummy morning. Lessons learned:
  1. backscatterer.org is wonderful. This blacklist lets me simply ignore many/most misconfigured systems that want to give me backscatter spam.
  2. Do not, repeat, do not use a lockfile for your SpamAssassin procmail recipe. This is why mail was not getting through. SpamAssassin takes several seconds to process an e-mail, and because I had it set to use a lockfile, only one SpamAssassin instance would run at once. In other words, I could only receive ~1000 e-mails per hour on a sustained basis before some e-mails were at risk of being dropped, and in a backscatter or spam flood like this, the rate is much higher. Here is the recipe I use now:


# Send mail through SpamAssassin. Note that we do NOT use a lockfile (unlike
# many examples on the net) in order to avoid timing out delivery under
# sustained spam barrages (we do use lockfiles below to serialize the actual
# delivery into folders).
:0fw
* < 262144
| /usr/bin/spamassassin

(Note: Yes, I should be using spamd, and I plan to, but I haven't gotten to it yet.)