There's Always a Relevant xkcd
It is known that there's always a relevant XKCD comic regardless of the
situation. We prove this with our website! Users simply enter a sentence or
two and the page shows the relevant XKCD comic.
Dan Zhang and
Megan Ruthven created this website
to exemplify this phenomenon. Relevant XKCD pulls information from the title and
content each image to compare against your request.
Try it out by typing in a description of a comic you are looking for and wait
for it to appear before your very eyes! We suggest writing longer sentences
gives our algorithm more data to work with.
These images are from the original
xkcd online comic. We do not claim these
images as our own work, but we do claim they are awesome!
The idea for this website was conceived by Dan Zhang
with the goal of winning the HackTX hackathon. With Dan working on the back-end and
Megan Ruthven working on the front-end, they came close to
their goal, placing 2nd overall
out of a playing field of 64 submitted projects
and 500 total participants.
To make this project work, we scraped the excellent site explainxkcd.com,
which contains not only a transcript but also a detailed explanation for every XKCD comic ever created. Using this
information, we form two vectors for every comic, in which the dimension of the vector represents the number of times a
word occurs in the explanation or transcript. To account for common words such as "the", we normalize the value of the
dimension by the total number of word appearances across all comics. For example, if "the" occurs 20,000 times, then we
divide that dimension by 20,000. Similarity between a provided query and a comic is given by the dot product between the
query vector and transcript+explanation comic vectors. For more details, you can view our final presentation
After the competition was over, we learned that our algorithm was fairly similar to a well-established algorithm in
information retrieval known as tf-idf. We have since updated
our algorithm to implement tf-idf properly with cosine similarity.
Also, we have now introduced a dynamic learning aspect, in which users can give feedback regarding the accuracy of the
returned comic. This technique uses a Naive Bayes classifier
to choose between the top two returned results. We hope to continue to extend the project in the future and introduce more
advanced machine learning techniques to further refine our results!