DHC Weekly 2/15: JSTOR Text Analyzer

Hello DH-ers! Do you remember a few weeks ago, when I introduced you to a new tool from JSTOR? In that post, I mentioned that JSTOR’s other tool in beta, the Text Analyzer, hadn’t been working for me. Well, after many emails exchanged with an infinitely patient project manager at JSTOR (thank you Michael!!), the issue has been sorted, and I am so excited to tell you all about the Text Analyzer!

The JSTOR Text Analyzer is a tool that allows you to use your own documents to search JSTOR for articles and books — just upload or drag and drop a document onto their portal, and an algorithm will scan it for key terms and conduct a search for you. Then, you can add or remove terms to correct any imprecision on the part of the algorithm. Obviously, this is a pretty cool party trick, but algorithms are far from infallible, so I’ve been excited to play around with the tool and see how effective it is at actually gauging what’s relevant.

screencap of the Text Analyzer interface -- it tells you to upload your document to search

As a first pass, I tried uploading the text of a post I wrote for the DHC site a couple months ago, on Shakespeare and video game glitch memes as embodiments of Bakhtin’s grotesque. Here’s what the Text Analyzer tagged as the keywords of that essay:

the text analyzer results, explicated in the following paragraph
the text analyzer results, explicated in the following paragraph

 As prioritized terms we have “Video Games”, “Virtual avatars”, “Animation”, “Clowns”, and “Carnival,” with more of a kind of general spread of things having to do with games and computers under “Topics”. Shakespeare, which I would call the jumping-off point for the whole piece, doesn’t show up until you get down to “People”, and even there the algorithm can’t really tell the difference between “Shakespeare was mentioned in this document because it was written in response to a paper on Shakespeare” and “James Bond was mentioned in this document because one of the glitches discussed came from Goldeneye video game.” So from here I might tweak — take out “Clowns” from my prioritized terms, perhaps, and select “Gestures”, weight “Virtual avatars” more strongly than “Video Games”.

It’s worth noting, however, that the goal here is not necessarily the neat trick of putting in your work and getting out perfectly tailored search terms (although that’s pretty cool) — the purpose of a tool like this is to find writing that could potentially be in productive conversation with the work you already have, something that it absolutely does, even before making some refining changes.

The Text Analyzer also says that you can upload an image, something that piqued my interest; I was curious if having to OCR text from an image would add an additional layer of error. I tried with the flier image for the DHC’s Algorithms of Oppression reading group event (sidebar: if you RSVP’d to this please come pick up your book!), and got pretty much a slam dunk of the keywords “Library collections”, “Search engines”, “Digital humanities”, “Oppression”, and “Algorithms”.

Where I had the least amount of luck was when I uploaded a full 21 page research paper — a piece of writing I’d personally identify as pretty solidly about Antony and Cleopatra and performance, but for which the Text Analyzer’s output terms were “Ancient Rome”, “Roman and Byzantine Egypt”, “Art happenings”, Mystery plays”, and “Arias”. I think the correct conclusion to draw here is that the Text Analyzer is most practical given shorter and more to the point copy — less your manuscript than your precis. But either way, it’s a fun tool!

Leave a Reply

Your email address will not be published. Required fields are marked *