Relevant XKCD

About

It is known that there's always a relevant XKCD comic regardless of the situation. We prove this with our website! Users simply enter a sentence or two and the page shows the relevant XKCD comic. Dan Zhang and Megan Ruthven created this website to exemplify this phenomenon. Relevant XKCD pulls information from the title and content each image to compare against your request.

Try it out by typing in a description of a comic you are looking for and wait for it to appear before your very eyes! We suggest writing longer sentences gives our algorithm more data to work with.

These images are from the original xkcd online comic. We do not claim these images as our own work, but we do claim they are awesome!

Technical Details

The idea for this website was conceived by Dan Zhang with the goal of winning the HackTX hackathon. With Dan working on the back-end and Megan Ruthven working on the front-end, they came close to their goal, placing 2nd overall out of a playing field of 64 submitted projects and 500 total participants.

To make this project work, we scraped the excellent site explainxkcd.com, which contains not only a transcript but also a detailed explanation for every XKCD comic ever created. Using this information, we form two vectors for every comic, in which the dimension of the vector represents the number of times a word occurs in the explanation or transcript. To account for common words such as "the", we normalize the value of the dimension by the total number of word appearances across all comics. For example, if "the" occurs 20,000 times, then we divide that dimension by 20,000. Similarity between a provided query and a comic is given by the dot product between the query vector and transcript+explanation comic vectors. For more details, you can view our final presentation here.

After the competition was over, we learned that our algorithm was fairly similar to a well-established algorithm in information retrieval known as tf-idf. We have since updated our algorithm to implement tf-idf properly with cosine similarity. Also, we have now introduced a dynamic learning aspect, in which users can give feedback regarding the accuracy of the returned comic. This technique uses a Naive Bayes classifier to choose between the top two returned results. We hope to continue to extend the project in the future and introduce more advanced machine learning techniques to further refine our results!