Fight Spam, Digitize Books
![]()
We’re experimenting with CAPTCHA images from the reCAPTCHA project at Carnegie Mellon, here’s an excerpt from the reCAPTCHA wikipedia entry:
“reCAPTCHA supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, thereby contributing to the digitization project. The result is that the university receives approximately 3,000 man hours per day of free labor to help in the preservation of books.”
Luis Von Ahn, who originally coined the term CAPTCHA back in 2000, is also the guy behind the reCAPTCHA project, not to mention the ESP Game, and several other “games with a purpose.”
June 10th, 2008 at 1:22 pm
I saw this the other night and thought, “How Craig!”. And it works well. Hooray!
June 11th, 2008 at 12:15 am
The purpose of this ’solution’ is to stop spam, but spammers have already proven more resourceful than the reCaptcha project.
Craigslist has effectively outsourced the digitization of books to India, where hundreds of low paid workers sit at computer screens solving Captcha’s for a buck an hour….
We already saw this with TicketMaster and now Craigslist has jumped in the foray. The difference is we all thought Craigslist was more socially conscientious than that….
True colors?
June 11th, 2008 at 11:43 am
Sorry, but “CAPTCHA” is a very weak resolve in the fight against spam. This will, no doubt, prove to be ineffective.
June 11th, 2008 at 1:17 pm
bad idea, captcha has been compromised and will soon be history.
June 11th, 2008 at 1:47 pm
Malo is right, the captcha is well on its way out.
With OCR, audio recognition, and cheap labor, it is not effective.
June 11th, 2008 at 4:00 pm
must be some sort of record — I posted a riding lawn more on craigslist and got a call , a sale and the cash in my hand in 2 hours after posting —thanks Craig List , Pete Pona
June 11th, 2008 at 5:04 pm
Today I had a great idea. I may not be the first to come up with it, but here it is. If we treat CL as a database, we can then come up with viewers that express that model. A simple example is as an RSS feed. A more advanced example is a website which has per-user tunable spam filters.
Also, it is imperative that CL start implementing double-blind email system. the SPAM over the past week has moved to a delayed “email me here” with a generic message. Using the double-blind system allows CL to monitor the responses for email addresses:
“Hi I’m Mary! How ya doin? Just wanted to mail you cuz I saw your post off Craigslist and I’m interested and would like to talk a bit more with ya. I can get ya pics just mail me back on marysfunlover@gmail.com cuz I don’t use this email addy much. Cya later!”
But the email came from hotmail… This should trip a filter that prevents message delivery and flags the sender/recipient as spam for other members receiving messages using the same addresses.
June 11th, 2008 at 8:32 pm
Something really needs to be done. The spam postings have been exponentially increasing in the past few weeks. The solution will probably come in several levels, but it has to start somewhere. Although I don’t have great confidence in the CAPTCHA system alone, it is a start to fight this ever increasing problem.
June 12th, 2008 at 11:41 am
There is already a decoder out for this — it took 3 days for spammers to solve — and the rest of the spammers that haven’t solved it are using cheap labor here in India. Last it was $.01 per captcha, this used to be $.05 per captcha and the price is going lower and lower as volume goes up. You would think that demand would cause the price to go up, no yet labor force is too large. Living in India I see the explotation of labor all the time, this is no different. Craig come up with a different solution there is a presentation by Matt May at http://www.w3.org/2004/Talks/0319-csun-m3m/slide1-0.html that may help you solve this problem.
June 12th, 2008 at 9:11 pm
Just another suggestion for fighting spam on the personals (for the love of pete check out the w4m personals in NYC - 75% is dating/spam-bait and 90% of these contain a human-readable URL and search-aquired random image).
How bout you make this more usable by simply disallowing URL’s in the personal section? A single line of RegEx would curb a large amount of junk. I see absolutely no reason why a URL should needed for a personal ad (most of which are highly anonymous in nature anyhow).
June 13th, 2008 at 7:36 pm
I have seen lots of different Captcha type systems and the current one for Craigslist SUCKS ! My vision goes in and out and often work in cafes and places where the lighting sucks. Craigslist Captcha help text says type in the two words… but, rotating through almost a dozen captchas it looked like many of the ‘words’ are not really words but made up ‘groups of letters’ formed into two words.
These are not common, easily spelled dictionary type words. Other captchas are much clearer and easier to read (some even used subtle shading and colors) but the current provider of captchas for craigslist are not nearly as sharp and easy to read as many others found on the net. And their captchas do not require Sound plug-ins to hear the audible words or letters spoken and they are just as hard for OCR or robots to crack.
On craigslist, this captcha being used will most likely cause more ‘drop off’ and fallout of postings and users because it just got way too much more difficult to post.
Try looking at other captcha type models and see if they are not better and have much better reviews than the current style being used on craigslist.
I just spent almost 20 minutes and ‘lost’ what I was posting twice just trying to get past the new captcha and I’ve been doing computers and programming for almost 20 years, being quite an expert and skilled on the internet… I cannot imagine how frustrating and what a turn off this new system is for the average user or consumer to use.
June 13th, 2008 at 10:27 pm
Great job! I just saw this… and it’s *wonderful* to see larger sites supporting them.
The real issue is that security is needed vs possible benefit (like helping a project at CMU). Any captcha can be broken. Supposedly, even the best (Google, etc.) have been. Spam fighting is an active war, with great amounts of labor on both sides. Once a site gets the number of hits that CL does, someone’s going to break whatever is used. I’m glad you’re at least doing some good with the technology.
Keep up the great work!
June 15th, 2008 at 9:01 am
Well, it seems to work well enough - I noticed the sudden drop in spam and went looking to see how yas made it work.
June 16th, 2008 at 6:58 pm
captcha and effectively shut up four regulars in the feedback forum. While you’re at it, ban the words brittle and fragile too. TYVM
June 17th, 2008 at 7:58 am
Hey, have you taken a look at Mollom? This is a very good open source anti-spam project started by the creator of Drupal, and there have already been adaptations of it to various platforms. I’m sure it would be simple enough to get working on cl. See http://mollom.com/
June 18th, 2008 at 9:43 am
The spammers are now back in full force in CL personals. Congratulations on four days of somewhat reduced spam, time to try something else now.
June 18th, 2008 at 5:13 pm
I’m not certain what the answer is for spamming. I wish I had a suggestion that might help. I almost feel bad about that considering how beyond irritated I am with all the spamming that goes on. Something has to be done though. I really like the idea of Craigslist. It’s a cool site, but just can’t be great until the spamming is under control. I may be only one person, but this one person is damn close to giving up on this site. Bet I’m not the only one. Hope you fix the problem before there are too many people as pissed as I am.
June 19th, 2008 at 2:43 pm
Agree with jan…..there were a few hours where the personals were actually a personals site…..now sadly that time is over.
It was a valiant effort.
June 20th, 2008 at 9:33 am
I could see where CAPTCHA would work because I can barely make out the words. I spend more time trying to identify what the word is than I do on the listing. Go back to the old way!
June 21st, 2008 at 12:48 am
This whole reCaptcha verification business does not make sense.
Either someone or something “knows” the right answer to the v-word, or not. If so, then the reCaptcha story is bull. If not, then there is no pass-fail standard for the v-word and it’s not useful directly as a “human detector” as a machine could make bad guesses too.
If OCR cannot read the garbage, and humans are taking guesses (many of which are wrong), what is the standard which says pass/fail? reCaptcha supposedly uses humans at CL and other sites to read the strings to save man-hours in cleaning up digitized texts. But if CL users are the standards then how can the words be used reliably as verification?
Using tougher v-word images makes sense up to a point (the point being rejection of marginal human users), but the reCaptcha story doesn’t add up here.
June 24th, 2008 at 1:51 pm
Some suggestions, some may be stupid:
1. Use cookies, the person posting should have the same cookie as the one validating the post.
2. Validate after posting. After posting send to validation screen where a mailed password should be entered, all in the same session. Mail systems are fast, if mail timesout then allow sending another one from the validation screen. The HTML in validation screen should be different every time, so automated posting tools won’t be able to parse without new logic.
3. Make the posting logic different everytime, so automated posting tools
won’t be able to parse without new logic.
4. Limit the number of messages a day from an ip address that has been flagged in a number of different ads
5. When posting display a captcha that has to be entered also in the validation screen
6. Limit the number of postings in the same session
7. Post validation should occur within an hour or two
8. Posts in the ‘Free’ section should last 7 days at the most
Will think more…
June 25th, 2008 at 7:46 pm
The most effective spam stopper I’ve seen is requiring the poster to complete a simple math equation. I’m not an expert or anything, but it seems to work.
June 27th, 2008 at 7:28 am
The spammers on this site are relentless and annoying. Although they have patterns that help identify them, some are harder to spot. I do have an idea that may help us identify these dirtbag’s posts, at least for the personal ads with pictures (which the spammers use to lure viewers). It requires all of us to participate when we post your pics and it is rather simple - post more than one pic of yourself. Not one of you and another of your pet (as spammers have done). Wherever the spammers are getting the pics, it is much harder for them to find two different pictures of the same person. Let’s not make this easy for them!
June 27th, 2008 at 4:22 pm
I miss craigslist personals, it has provided with many nice experiences and opportunities to interact with interesting people. Current levels of spam etc have diluted the valid responses past the point of diminishing returns.
Personally, I would be willing to authenticate Out-of-band, ie: recieve an authentication code, txt’ed to my cell phone. I would be Ok supplying my cell phone number and getting a code to enter for both posting and replying.
Linking participation to something sticky like this will hobble the spammers.
June 30th, 2008 at 6:23 pm
Now, can you concentrate on writing non-run-on sentences?
“We’re experimenting with CAPTCHA images from the reCAPTCHA project at Carnegie Mellon, here’s an excerpt from the reCAPTCHA wikipedia entry:”
July 14th, 2008 at 1:30 pm
CAPTCHA seems to work well for the spammers - maybe it is just this old lady that has problems with it. Gees 3, 4, 5 tries to get one I can read…
July 18th, 2008 at 9:47 am
This is a sick thing CAPTCHA I do have a life and it is not playing word games that can take 20-30+ trys or not and give up. some people DO NOT live all day in a room in frount of a computer Thank You
July 18th, 2008 at 9:48 pm
Doesn’t work, Garbage, tried more than 30 times, still can’t get my ad posted…..Someone needs to fix it…
July 22nd, 2008 at 8:12 am
Sometimes… it works. But NOT today.
There must be a better way.
I gave up posting after several attempts.
I know for a fact that I typed the word correctly
There are times that the words are hard to read.
Looks (at times) as if it has a comma at the end.
With the wavy line even makes it more difficult to read.
At times I see only part of a letter and must guess..
Needs some work!
July 22nd, 2008 at 2:36 pm
I wrote a song about Craigslist w4m ads, which includes CAPTCHA. Would it be okay if I posted it here? If nothing else, you may get a good laugh, or at least agree with it.
July 23rd, 2008 at 10:15 am
I found the system TOTALLY UNUSABLE!
The system rejected the “confirmation code” half a dozen times, so I gave up trying to post anything.
This is MONUMENTALLY STUPID!
July 23rd, 2008 at 11:26 am
reCAPTCHA, and its evil mother the CAPTCHA are discriminatory against those with dyslexia and other disabilities, not to mention the fact that their usefulness has been negated by spammers.
Luis Von Ahn, there is a special place in hell for you. If you want me to help you digitize books, you better pay me for my time and effort.
Otherwise it is like all these goddamned “no pay” or internship jobs that really should belong in their own category so people who actually want to get paid don’t have to sift through all the garbage.
July 23rd, 2008 at 7:44 pm
First time trying to post an item on Craigslist in a few months and I have come across this new recaptcha thing. What a joke, I am so frustrated. I need to post and have tried the visual and verbal at least 20+ times. I know I am right and it is not going forward. Any suggestions? Thanks
July 24th, 2008 at 5:42 am
Tried to post this morning and Low and Behold 2 tries on each add and finally got it posted.Thanks
July 26th, 2008 at 10:42 am
I like the concept, hate how it (doesn’t) work!!! I always type it in correctly but it won’t let me post! I’ve tried about 2 dozen times, including starting from scratch again, and then assuming it’s caps sensitive. Please sort out the bugs!!
July 26th, 2008 at 10:53 am
I have been trying for several minutes now, and I’m completely unable to get past this system. A couple times I have been certain of what the words are, but when I enter them I get rejected. I have tried the sound feature as well, and can’t get through that either. This system is making it impossible to post, which is especially frustrating given that spam sites seem to be getting through it even while actual humans cannot.
July 26th, 2008 at 10:53 am
This captcha is horrible. I really wanted to post but couldn’t after (I believe) I typed the words in correctly. I thought it was me, but am convinced after trying multiple times to post that the program doesn’t work consistently. Maybe spammers are having more luck placing their ads over legit ads. I’m disappointed.
July 26th, 2008 at 11:05 am
I’ve now typed in about 100 captchas correctly and it says I’m wrong everytime. As for the audio, no clue what they are trying to do with that. Am I Michael Keaton in White Noise?
Guess it was good while it lasted.
July 28th, 2008 at 4:28 am
How does the 7th age of computing effect this?
July 28th, 2008 at 10:26 pm
I tried about 30 times and give up, I used to use craigslist all of the time but I give up and will be looking for a better place to put up ads without the headache.
July 29th, 2008 at 9:34 am
Re-Captcha or Re-CRAPcha?
I live in an area that does not and will probably never get high speed cable. So basicly I’m on dial-up and it takes a while for screens to refresh. I think i’ve spent well over a hour and a half typing CORRECTLY the words i’m saposed to type! I’m all for helping out the whole book “research” thing but this is flip’n irritating! The bugs should have been worked out before starting it up….why do we have to be the guniea pigs? I agree with what another said about getting paid for doing the work for Re-Captcha. I got two babies…and not enough time…thanks for taking away what spare time I have just to post a “Free item to whoever needs it” add.
July 29th, 2008 at 6:12 pm
I had trouble with this CAPTCHA thing while posting. I had to try about 14 times before I was able to decipher the scribble. Is this intentional?
I understand it’s to help elliminate spam, but you have to try something else.
July 29th, 2008 at 9:46 pm
Here’s a solution to the spam problem:
Find the bastards and sue them until their faces turn blue.
Make Craigslist risky for spammers. That’ll scare them off.
July 31st, 2008 at 9:44 pm
i hate captcha. captcha has made it impossible for me to post to CL from home. for some reason it does not accept my translations from my mac laptop at home, but on the pc at work it accepts my answers. i am extremely frustrated. of course i can translate these words, i tried probably 100 times over 4 days, even tried the audio version several times, i have emailed recapthca and CL my concerns to no avail, i have gotten no replies from either of them. i am not a spammer, i just want to sell my extra stuff. please switch to a system that will let me work from my mac at home. i hate captcha.
August 4th, 2008 at 4:48 pm
I’m having trouble too?
I Says: “You have not entered the verification word correctly. Please try again.”
But I know I have entered in correctly, over and over.
CL seems to love leaving users in the Dark about what’s or why something isn’t happening.
WHY WHY WHY ISN’T IT WORKING?
August 4th, 2008 at 6:30 pm
It is a lousy system. I usually have to cycle through it several times until I find a pair of words that I can actually make out. While English is a robust language, your users’ vocabularies are not infinite and many test phrases include words we seldom, if ever, see or use (assuming we can actually read them), making it easy to enter them incorrectly. Surely there is a more user-friendly system you could use. This one is user-hostile.
And I’d like to second comments regarding Craigslist utter lack of responsiveness to user queries and complaints. Nice little black hole you’ve got there for an Inbox. Stuff goes in; nothing comes out. Terrific.
August 11th, 2008 at 9:09 pm
Somebody mentioned using a simple math problem. Some examples I’ve seen are like this: three x four = ___ (give numerical answer) Or it could be something like 3+5=eight or 7-5= two. This could change every time somebody signs in.
August 12th, 2008 at 4:07 pm
I’m disabled and can’t use the captcha process. my options are to have my 5 y.o. grand child do the deed or pay for a hacker program that will defeat the process.
Neither option speaks well for the technical quality of ‘protection’ afforded.
August 13th, 2008 at 4:10 am
Sorry but captcha will soon be history.
August 14th, 2008 at 2:41 pm
Simple solution: track down and kill spammers. If they have so successfully overpowered the system that they can get away with it with impunity, then we can overpower and swamp the judicial system into not prosecuting spammers’ murders. The illegal borderjumpers overpowered the system. We live in a day and time of system-swamping. Just kill the spammers–excess humanity.
August 19th, 2008 at 7:45 am
The question becomes, what is the future?
August 21st, 2008 at 3:51 am
So if this stops spamers how do we stop the stupid e-mails about people telling me there going to buy my car with a check and some one is going to pick it up. Because I get 5 of those every time i try and sell my car??????
September 24th, 2008 at 5:41 pm
There is still no way to combat flakes and people who never do what they say on craigslist.
October 2nd, 2008 at 11:47 am
This is a fantastic idea to harness the otherwise wasted time of people solving CAPTCHAs. If computers can solve these CAPTCHAs then that is great news for the AI/visual processing community. If they can’t, then this is an effective way to stop spammers. People’s concerns about cheap labor in India to solve these CAPTCHAs is still founded though because you don’t have to solve very many to saturate craigslist with spam.
October 4th, 2008 at 6:06 am
Ultimately, there is someone downstream that buys something that the spammers’ customers are trying to sell. It’s at that point where the credit card companies can be used as a legal mechanism to shutdown accounts (or reverse charge) those companies who are sending spam. The government has already shutdown international gambling in this way, so there is a precident for legally enrolling the CC companies in the fight. The final piece of the legal mechanism required to make this work: that is to have an unambiguous “opt-opt” which cannot be circumvented by any of the wrangling that even legitimate commercial companies try to do with email - the answer - anyone that has “optout” as part of their email address, automatically removes permission for anyone to send them unsolicited mail. If it happens, the user then invokes the legal mechanism above and has the credit card company “reverse charge” for using their email address, simply by using their purchasing mechanism set up by whatever means the spammer’s customer’s are trying to sell their wares.
October 11th, 2008 at 2:52 pm
you guys must be kidding me with this captcha … I tried 5 times before I could get it right … it’s not possible to understand wtf is written there. Get real!
October 12th, 2008 at 11:12 am
I’m embarrassed to admit that occasionally I have trouble reading the CAPTCHAs. I get screened just for poor vision, and I’m not even spam.
January 29th, 2009 at 3:27 pm
Can you guys make a blog post addressing Craigslist’s decision to go “NoFollow” on all sites?
February 8th, 2009 at 2:05 pm
tired of mystery words
i have a big problem with the verification mystery words that we have to enter to place any ad on craigs. just placed an ad and had to try at least 30 times before the system accepted my guess. i know at least half of my attempts were correct. i can’t be the only one that has these problems. how do we actually let someone at craigs know that we dislike this system and not just complain to each other. thanks.
March 6th, 2009 at 4:43 pm
Our blog http://blog.gogopin.com doesn’t use captcha - and we just might start. It’s unbelieveable the number of bot generated comments that we receive on our blog posts. Mostly comprised of 8pt. illegal sordid text. Not helpful in the least. Until then this human, myself, is keeping captcha unemployed!
March 12th, 2009 at 8:22 am
Captcha wards off spammers and bots but sometimes makes it harder for users to post. Well, for a site like this big where posts come in by millions, captcha helps a lot. Go for reCaptcha.
April 24th, 2009 at 3:52 am
I think captcha works wonders. lots of different options. I think the best one is when you add a basic math calculations
June 11th, 2009 at 2:15 pm
I’ve capitulated - we do use reCaptcha now - and it works very well (see my comment above before implementation).