Translations Abstraction Proposal for WordPress

plugins.svn.wordpress.org:

  • There is a new special folder in Plugin Repositories, at /assets/i18n/
  • A /assets/trunk.pot file should probably be maintained for support of repositories where trunk is their public release version.
  • Whenever a new tag is created, for example /tag/3.2-beta1 (a script automatically|the plugin author) generates the po/mo source files, and stores it as /assets/i18n/3.2-beta1.pot or the like.
  • I’m torn as to who generates this, and I have no desire for this to generate additional changesets on our already burdened plugins.svn.wordpress.org — perhaps they could live elsewhere, and wouldn’t necessarily need to be under version control.
  • A /assets/i18n/master.pot and the like file may be maintained which is a merger of all the strings in all the versions in all the tags.
    • This is to simplify things so that if a string is dropped from one version to the next, it is still included in a master index, potentially simplifying things from a storage perspective, so that if requesting translations for a plugin, a version number does not need to be specified, and GlotPress wouldn’t need to store fifty copies of the same string for the same plugin, if there are fifty different tagged versions.

plugins.glotpress.wordpress.org

  • I know very little about the inner workings of GlotPress currently, so I’ll leave this part of the proposal as a ‘black box’ that magically works.
  • Sharing identical strings between plugins would be amazing, if possible, but with the ability to break the link if one plugin author needs it to be translated differently.  But I suppose that can be done with _x() and notes for translators.
  • API requests for translations shouldn’t by default be given up-to-the-second results.  If there’s a cached version from the last 24 hours, or it hasn’t been invalidated with any new translations yet, just serve that version up.
  • Gzip it all in transit & storage.

core

  • This would be implemented as a plugin tentatively by using the 'override_load_textdomain' filter — which would then query the API and either store the translations in a transient/option, or in a folder within /wp-content/.
  • If not using a transient, set a wp_cron task to check for updates every X days, weeks, or on upgrades / installs / manually pushing the update translations button.
  • Store a version number for the most recently received translations, and pass that back with subsequent queries, so it only receives the strings that have been updated since the last pass (huge potential savings on bandwidth and server processing time).
  • On the (client|server) side, round the version number down to a given interval (thousand, ten thousand?) so that it can be cached more easily on the server side.  A couple duplicate translations could get delivered, but that’s a small price to pay for the savings in processing time.

Make your nav stick to the top of the screen when you scroll past it!

A pretty simple JS include to use when you don’t already have jQuery or the like included on a page.  Just adds a data-nav=”fixed” attribute to your body element that you can style off of via body[data-nav=”fixed”] .whatever {} — this also has the benefit of not hardcoding any styles — you’re free to do it all via your CSS files.

Just remember to swap out ‘nav’ for whatever ID or selector you’ve got on your actual nav.

Potential optimizations would be not using document.getElementById over and over, and just leaving the element cached in a global.  I would advise against caching nav’s offsetTop, though — as it’s possible that things may change in the dom, and that could change!

window.onscroll = function() {
    var d = document,
        nav = d.getElementById( 'nav' ),
        att = ( window.pageYOffset > d.getElementById( 'nav' ).offsetTop ) ? 'fixed' : 'normal';
    d.body.setAttribute( 'data-nav', att );
}

Final-ish legislation.sql table structure

The final (until I add a sponsors table) db structure for the legislation.  Works with the existing import script, with added indexes for easier querying.

Legislation DB Dump

Still not quite the final DB structure I’d like, but this is available for data mining and trying to build something awesome out of.

OpenDataDay Hackathon DC!

So I went down to DC this weekend to participate in the Open Data Day Hackathon!  There were some tremendous projects proposed, but the one that caught my eye from the start of the day was one proposed by Jim Harper of the Cato Institute to track down the genealogy of legislation put forth in congress.

Basically, the goal is to programatically find similar passages in multiple bills.  This can be used for many purposes, including looking at sections in large omnibus bills and getting an idea if the things that get shoehorned in it have been proposed previously, and what happened then.

So, our team largely consisted of  myself, Alexander Furnas of the Sunlight Foundation, and John Bloch of 10up, with guidance from Jim Harper (previously mentioned, of the Cato Institute), Molly Bohmer (also of the Cato Institute), and Kirsten Gullickson providing some clarification on the way the XML data we were working with was structured.

I spent my time building a MySQL database and a PHP import script that could map all the relevant data from the XML files in to it.

Alexander worked in Python primarily fleshing out a way of doing Latent Semantic Analysis on the data we’ve extracted to sort out what is similar to what, and where can we find meaning in it.

John spent his time working on a front-end for the final dataset, to help end-users get something useful out of the data we’re building.

The data that we were pulling from can be readily accessed by anyone through the Library of Congress at the following URLs:

I’m currently putting some finishing touches on the DB structure, but when that’s done, I’ll be releasing that and the import script in a subsequent post, as well as a SQL dump for the final accumulated and sorted data — ripe for data mining.  As the day was wrapping up, I had someone come to me inquiring about data mining for references to money allocated in appropriations bills and the like, and I was able to very quickly do a MySQL query along the lines of

SELECT * FROM `resolution_text` WHERE `text` LIKE '$%'

to find anything that started with a dollar sign and then listed an amount over a very limited data set of three million rows or such.  The final data set will be much larger.