The serverless components of the Pun Generator Application are described in a another post. In this post, more detail on the methodology is described (and more to come).

A rudimentary algorithm, with limited to no error handling or enhancements. The code executes the following steps:

  • Receives the input word. For now, not treatment to this word is performed. It does a look-up for a dictionary in S3 that converts the word to its pronunciation using the CMU dictionary.
  • It checks DynamoDB to see if the word has been searched before. This is a computationally cheap way to prevent repeat calculations. DynamoDB acts something like a cache.
  • If the word has not be searched before, a list of idioms is pulled from S3. These idioms have been scraped from a few websites that list a few thousand idioms. The output is limited by the quality of these idioms and the subsequent step.
  • Each idiom is converted to pronunciation form and then the distance between the input word and each word in the idiom is calculated.
  • The shortest top 10 distances are outputted and the results are cached.
The specific architecture is sketched here: Figure