Sinology with Python

I’m a bit late to the Jupyter Notebook bandwagon (and bandwagons in general), it seems… Now that I’ve started using it I’m seeing it everywhere.

The Chinese Text Project is one of my favorite websites, because it not only offers literally thousands of premodern Chinese texts online for free, but provides sophisticated search functions, a built-in dictionary, and other nifty features, many of which were added quite recently.

Among them is an API for accessing the text database programmatically, and a Python module which provides easy-to-use wrapper functions for it (the module is written for Python 3 and doesn’t work with Python 2, however it simply fails and doesn’t show an error message if you inadvertently try to use it with Python 2).

Donald Sturgeon, who is the author of these tools and also the maintainer of ctext.org, has posted some online tutorials using Jupyter Notebooks to show how to access the database, and some simple data analyses that can be done on texts.

Saving R workspace in Jupyter Notebook

I’ve recently been trying out Jupyter Notebook to organize my work. I had been holding out against using Jupyter or its predecessor Ipython becuase I was under the impression that it was only for Python users, but after taking a closer look it seems that you can also use other languages with it if you install the appropriate “kernels”. I now have it on both my work computer (running Linux) and my laptop (running Mac OS X) and it was relatively painless in both cases to get everything running, because the installation can be handled from a package manager. I’ve been using it with R, and the kernel for Jupyter is simply a package that you install from within R, though you have to remember to install the package when running R from a terminal window and not in RStudio.

One problem I encountered with an R notebook in Jupyter, though, was saving my workspace. In a normal R session I’m used to saving my workspace at the end of the session and coming back to it later to pick up where I left off. However, with the Jupyter notebook I found that I had to rerun all the code to regenerate all the objects again! This appears to be an issue for Python notebook users too.

There’s a very simple fix for this: Just run the standard R command


save.image()

Your workspace will then be saved to the usual hidden .RData file in the same folder as the Jupyter notebook. If you want to share the code and the workspace, you’ll have to make sure that you copy both the notebook file and the .RData file that goes along with it.

Likewise, if you start a notebook in a folder that already has an .RData file, you’ll find that you can access that workspace from the Jupyter notebook – just run ls() to see what’s there.

I wonder if I may have missed a ‘save workspace’ function that’s already built in, though…

 

Visualize metagenomes in a web browser

In my day job I work with metagenomes from animals and protists that have bacterial symbionts, and I’ve blogged here before about why visualizations are so useful to metagenomics (mostly to flog my own R package). However most existing tools, including my own, require that you install additional software and all the libraries that come with them, and also be familiar with the command line. That’s pretty standard these days for anyone who wants to do serious work with such data, but it can be a big hurdle for teaching. Time in the classroom is limited, and ideally we want to spend more time teaching biology than debugging package installation in R.

I’ve therefore written up a simple browser-based visualization for rendering coverage-GC% plots, called gbtlite. There’s no need to mess around with data structures in R, or worrying about how to install required packages for your operating system. The visualization uses the D3.js Javascript library, which is popular for web applications. If you’ve played with infographics on the New York Times website, then you’ve probably seen something built with D3.

Continue reading

More punny business name ideas

Bad puns in business names are rife among German hairdressers, for some reason. There’s even a blog chronicling this plague. Many of the names are bilingual puns, usually revolving around the words “Haar” (hair), “Kam” (comb), and so on. Off the top of my head some that I can remember seeing include “Haarmonie” and “Haarlequin”.

However, nothing beats a hair salon I saw in Singapore called “Katamo”, with a sign lettered in that faux-Japanese font that cheap sushi places like to use. In case you don’t get it, think along the lines of “ang moh”…. (apologies to readers who simply don’t know the language.)

If you can’t beat them, join ’em. Here’s some more business name ideas, following up on my last post. You saw them here first!

Continue reading

Animated map projections

Vox.com has a new video talking about map projections:

Nothing new here in terms of the history, but their animated transitions between different map projections are really cool. They were rendered using the d3-geo-projection plugin of the dataviz library D3.js by its author Mike Bostock. Compared to gmt, which I’ve blogged about before, there are more classes of projections available if you use the plugin, and it’s designed for web and interactive content, whereas gmt is still very much oriented towards print (Postscript files!) and static images. It’s also more accessible for casual audiences because no special software has to be installed – all modern web browsers support Javascript. Maybe I could reimplement my lantern globe project in D3.js …

Hosting a website with GitHub Pages

My website seaheuchin.info is a labor of love for me – it is a work of family history and biography that I researched and started to write up in 2007, but only some months ago did I publish it online as a website.

screen-shot-2016-12-04-at-11-23-23

Web-publishing is the poor(er) man’s self-publishing, but it does offer some advantages over paper. I can easily update the site with new features, such as photographs and transcriptions of original documents, and also incorporate interactive features like maps and timelines. It’s perfect for a serial procrastinator like me, because I can make something that is mostly done but not yet perfect available as a working version. For example, I’m still working on text boxes to explain the historical background for something mentioned in the main text of the biography – many of them are already up, but for some I have only an outline of what I would like to write.

In this blog post I’ll explain what free tools (“free” in the sense of not paying money for it, not the “free” in Free Software Foundation) I’ve used to build and host the website. I want to show how a hobbyist with modest web skills, like me, can still get things online quickly and painlessly.

Continue reading