Tuesday, June 03, 2008

Organising my online reading

This time, the title refers not to blogs (where I haven't figured out the answer yet) but to scholarly articles.

Like most scientists I know, I tend to read journals online, accessing their webpages with a browser (usually Mozilla Firefox), reading either the HTML or the PDF on my screen, and only rarely bothering to take a printout. Many trees have probably been saved this way.

Also, like most scientists I know, when writing papers I use bibliography software (in my case BibTeX; Microsoft Word users are likely to use Endnote) to organise my references, and I maintain a database of papers I refer to.

The question is, what happens in between? There are two problems here: I want to save my reading material in a systematic way, so that I will find it again when I want to; and I want to save the citation information for it so that I can easily reference it in my own writing.

Saving things systematically is not my strong point. Ideally, I would save it to a subdirectory named after the topic of interest, and rename the file with an informative name, so that I can locate it with a simple directory listing. If a file belonged to multiple topics, I would make symbolic links to it in all relevant subdirectories. I'm sure it would work well. Instead, what I end up doing is saving everything to a directory named "papers" (or, worse, on my desktop), with the original filename which could be something like "10.1371_journal.pcbi.0020053-L.pdf". Good luck finding that again. Then when I need it again, I end up searching PubMed or Google Scholar for it.

As for citations, an alternative to the extreme tedium of manually entering each BibTeX entry into my database was to search for the paper on sites such as Hubmed (a PubMed front-end that can export to BibTeX format and do other nice things). This, in practice, is not tedium-free either.

Such was my workflow until recently. Now I have a better solution: Zotero.

I'm sure I'm late to the party and lots of people are using it already, but here's a description for the uninitiated. Zotero is a Firefox extension: when installed, you get a "zotero" button at the bottom right of the Firefox window, which when pressed, pops up the Zotero interface (or pops it down again). What it does is, it captures bibliographic information about the page you are currently viewing, and saves it to a database. Capturing is as easy as clicking an icon that shows up in your URL bar. (It's not restricted to scholarly journals: it works with news articles from the New York Times, or BBC News, for example.) Each item in that database has numerous fields: the usual bibliographic ones (title, author, journal name, etc), but also web links, notes, attachments, tags. It automatically extracts tags from some articles (via their "keywords" or equivalent section), but you can specify your own. You can search your articles, filter them by tag, and do various other neat things, most of which I haven't explored. Most importantly, you can export citations in BibTeX format (and also Endnote and various other formats).

Zotero works with Firefox 2, and the latest version also works well, in my experience, with Firefox 3 RC1, but has glitches with the previous release (beta 5) of Firefox 3. If you are overwhelmed by scholarly reading matter, give it a try.


Patrix said...

Google Scholar also lets you import citations into Endnote or BibTex. Of course, Zotero is useful if you are not using Scholar to look for the articles in the first place.

Rahul Siddharthan said...

Does Google Scholar export citations? I don't see the buttons/links. Clicking on the article takes you to an external site (which may or may not allow exporting citations).

Also, searching Zotero is much faster than searching Google Scholar since it's all local.

Anonymous said...

You are doubtless aware of the problems that integrating a database
with Firefox has recently caused.

I think it would be much better if the database was separately
created and managed. There are a number of different packages that
seem to fit the bill --- and export to BibTeX.

The browser plugin would then use some API to access this database.

I must confess to not having looked at Zotero. Perhaps it already
does this.

Anonymous said...

[Off-Topic] Note that www.blogger.com comment pages are broken
for text-mode browsers. When I select open-id as the commentator
identifier and put in my openid URL, the page comes back with "URL not
supplied". My guess is that this is due to some kind of broken usage
of Javascript. (My browser is w3m).

Rahul Siddharthan said...

Kapil - the problems with integrating sqlite in firefox were with ext3, not with firefox itself (though obviously they will need to attend to it for the sake of ext3 users). I had observed the problem on my laptop and not on my work machine, and only after reading about the problem did the reason become clear: my work machine uses XFS for my /home filesystem, and the fsync problem doesn't occur. (It could just be that XFS lies about fsync, like some other filesystems do, but I doubt that.)

Separately creating and managing a database means manually entering all the fields -- unless the standalone database can access a web URL and do it automatically. But in that case why not just use firefox/zotero?

As for blogger, I have a bunch of peeves about it. What alternative do you recommend? I was very close to moving to wordpress at one point, but didn't get around to it, partly because some of my peeves got addressed by blogger, partly inertia.

NoteScribe said...

I was reading through your post and was thinking, "this person should try Zotero", and of course, now you have! It's a great program. I use it alongside my note taking software, NoteScribe. The two combine for a great research force, and keep me well organized in my job and in my personal life. It's a good combination for notes and bibliography references.

NoteScribe: The Premier Note Software

Rahul Siddharthan said...

Jake - thanks for the ad, but I'm the wrong market I'm afraid: I don't use Windows and use almost no proprietary software (I think Adobe Reader and the Flash plugin are the only exceptions at the moment).