Study Says 75% of All Blog Pings Are Spam
According to a study performed at UMBC eBiquity Research Group at the University of Maryland, nearly three out of every four pings to blog ping servers are from splogs - or what they're calling spings! They also found that more than 50% of claimed blogs pinging weblogs.com are splogs. UMBC is also kindly providing hourly updated blogs vs. splog sping statistics on their Web site. Here's a snapshot that shows just how big the splog problem is.

Clearly this issue is bigger than everyone probably is imagining, despite what David Sifry says. This must be solved now. Who besides Mark Cuban is taking the lead on this? The future of the blogosphere is at stake here. This has to be addressed at the publisher level. Does anyone care about this or is everyone busy building new features?







Since people who are sploggers happen to understand the technology better than most, it makes sense they would use/abuse the tactic. In addition since producing splogs is a very scaleable practice it's quite logical they make up such a large percentage.
Posted by: graywolf | Thursday, December 15, 2005 at 06:31 PM
This is serious business and something I'm going to address with the group at Business Blog Consulting. However, the fact that I'm the first to comment must indicate that it's not a matter of great priority just yet, or that blogs are still not being taken seriously enough yet.
Ultimately, I wonder if it will require some type of government regulation similar to what email marketing experienced with the CAN SPAM act of 2004.
Posted by: Paul Chaney | Thursday, December 15, 2005 at 06:52 PM
"Who besides Mark Cuban is taking the lead on this? The future of the blogosphere is at stake here. This has to be addressed at the publisher level. Does anyone care about this or is everyone busy building new features?"
I can only speak for us personally, not as company policy, but we at Six Apart care a lot. We think one of the reasons a lot of blog indexers have had a hard time keeping up with the rate of updates in the blogosphere is due to splogs, and we also think it's an unfair burden for publishers to pass along to both users and other companies.
We've made a system called the Six Apart Update Stream (this is still a work in progress, so that page is kinda geeky/ugly.) and what it does is deliver high-quality data for indexers or blog search systems to use. Right now it includes LiveJournal's millions of bloggers, and it will soon include TypePad's as well. That's roughly 10 million bloggers that we can say aren't likely to be splogs.
We've been told by blog indexers that they've wanted to block other tools and services at times because they feel like the overwhelming majority of content being posted with those tools is from splogs, and honestly one of the reasons I'm glad we have a paid service is because that means part of the value we can deliver is that you're not in the midst of an overwhelming number of spammers who're being presented as your peers.
I'd never claim we've been flawless on this, but I don't think there's any other large blog publisher who's doing better than us, and I hope that meets yours (and the community's) expectations in the matter. I'd love to hear more from the indexers about what this problem costs them in terms of labor, spam filtering costs, wasted indexing time, and overall system responsiveness.
Posted by: Anil Dash | Thursday, December 15, 2005 at 09:16 PM
The study found that 75% of pings to Weblogs.com are spam, as determined by the group's own classifier. Weblogs.com is now owned by VeriSign, a company that plans to resell the ping data scrubbed from spam for reuse by other services.
Technorati is busy squashing spam as well as building new features. Spam in our index means increased storage costs and longer query times for our users. If we drop 25% of a possible index weight by fighting spam effectively, that's a lot less data to search through resulting in a faster response as well as better results for our users.
Technorati has organized two web spam summits to coordinate efforts in the publishing and indexing industries and share trends and best practices for fighting spam. We're all defintely not standing still.
Antispam techniques need to constantly change and adapt to a moving target as new tools and attempted techniques emerge. 6 month old data does not necessarily affect the current state of any industry, and we've seen big changes such as Google's increased focus against Blogger and AdSense creation and exploitation and Technorati tossing a lot more spam out of its index.
It's all about thrilling the searcher.
Posted by: Niall Kennedy | Thursday, December 15, 2005 at 09:19 PM
Chinese?
Posted by: Jim | Thursday, December 15, 2005 at 09:50 PM
That's fascinating... not that most pings are actually spings (what a great word) but that someone took the time to measure this and address it. I wonder if they've presented this information to Verisign. One of the things they said they were interested in doing when they bought Moreover and Weblogs was to address the splog issue, according to their post to the Verisign Infrablog was addressing the splog problem.
http://infrablog.verisignlabs.com/2005/10/verisignmorever.html
I bet they'd be interested in this.
(And UMBC is my alma mater. Yay, them.)
Posted by: Tinu | Thursday, December 15, 2005 at 11:11 PM
Matt Mullenweg posted similar numbers from an analysis of Ping-O-Matic traffic in the Feedmesh group on Yahoo about a year ago.
The problem, IMHO, is that there is absolutely no accountability in the ping process. It's a situation where technology has made something too simple with few repercussions for abusers. So, you're splog becomes blacklisted on a splog filter? No problem! The ease with which you can set up ten, twenty, forty new blogs and start all over is amazing.
Unfortunately, the problem is not the splogs, it's the 'sploggers' and our current splog-centric approach to fighting the problem is bound to fail. Until we realize that we need to target those who are developing the splogs, I don't think that a solution is bound to come about anytime soon.
Why not establish an account based system that can be used by legitimate bloggers?
First provide an incentive. Give anyone who is subscribed to an account priority in listings on the tracking services and search results. Consider charging a nominal one-time fee for the generation of this account number so that bloggers maintain all (or most) of their blogs under a single ID.
Then create a disincentive for abusers. Require the ID to be present in both the head section of the blog's html as well as in the ping that is sent to the tracking service in order to receive listing priority. Abuse the system at the risk of having your ID, not your blog, blacklisted.
I'd be happy to incorporate the transmission of a 'blogger id' into Pong and I'm sure that this would be relatively simple to include in the web-based ping applications that are out there if there is some suppport for this idea among the tracking services. My guess is that the blogs that I prefer to read are the ones that would participate in such a system and those that I have to wade through to find those blogs are the ones that would not.
Posted by: Chris Simpkins | Saturday, December 17, 2005 at 12:35 AM
Could this be why my blog isn't indexed or tagged properly? I've tried everything to meet the right qualifications for Technorati, ie XHTML and CSS well formedness but my posts never show up in it's engine.
I used to be indexed with no problem a month or so ago...which makes me wonder if I did something to offend?
Posted by: Brad Isaac | Saturday, December 17, 2005 at 05:26 PM
Could this be the reason my blog isn't indexed in Technorati properly? Have I been labelled a splogger? I've tried all the suggestions to make my blog XHTML and CSS compliant. But still my posts stopped showing up in the engine about 3 weeks ago.
Posted by: Brad Isaac | Saturday, December 17, 2005 at 05:53 PM
I do feel the blogosphere is feeling the effects of these splogs, I've even come across a service for reporting splogs, but it seems to have lost some steam.
I believe, as you've already mentioned, it's because not many people are vocal about it yet.
Methinks it would be a good idea to manually verify a blog before it can use the update services. This to me would be a feasible deterrent.
What splogger would want to manually verify hundreds, or even thousands, of weblogs individually before they were allowed to ping the servers?
Services like ping-o-matic and weblogs.com make it entirely too easy for splogs to ping them, and as a result, their channels are all clogged.
While I love the ease of their services, if it meant that my blogs wouldn't get lost in a sea of useless tripe, I wouldn't be opposed to verifying that the blogs I have pinging them aren't splogs.
While a fee structure is plausible, I believe it would place an unnecessary burden on those people who aren't planning to make a living from their blog(s).
Some people blog for therapy, to be heard, to find others who are going through similar things in life and to connect with them.
If they need to pay a fee for that, they will most likely just forsake it. Whereas, the slogmasters (who can afford it), will pay for the ability to sping the servers.
Just my opinion of course.
Posted by: Teli Adlam | Wednesday, December 28, 2005 at 02:05 PM