from the it-was-hard-and-it-took-forever dept
Automating this process means identifying sentences that contain potential euphemisms and follow a particular structure - a "hard natural language understanding problem", say the researchers. Kiddon and Brun began by analysing two different bodies of text - one containing 1.5 million erotic sentences, and another with 57,000 from standard literature.Apparently, the system is about 70% accurate so far, but they believe they can get it up to 99.5% accuracy before too long.
They then evaluated nouns, adjectives and verbs with a "sexiness" function to determine whether a sentence is a potential TWSS. Examples of nouns with a high sexiness function are "rod" and "meat", while raunchy adjectives are "hot" and "wet".
Their automated system, known as Double Entendre via Noun Transfer or DEviaNT, rates sentences for their TWSS potential by looking for particular elements such as nouns that can be interpreted in multiple ways. The researchers trained DEviaNT by gathering jokes from twssstories.com and non-TWSS text from sites such as wikiquote.org.
I'm sorry, Watson, but this may be the biggest computing/artificial intelligence story of the year. And, already, the race is on to come up with the appropriate jokes. My favorite so far was this quote for the researchers on this project: "It was hard and it took forever."