Archive for March, 2011

There Is No Medicine Fairy

I intend for this to be a programming and technology blog, so please forgive a rare excursion into matters political. This matter affects me and my household in a very personal way, so I feel rather strongly about it. I will return you to my regularly scheduled programming shortly.

Those of you who know me know that my wife has been struggling with Multiple Sclerosis for many years (it’s no match for her, you should know). Much progress has been made in slowing the disease’s progression, but so far, there have been no treatments to counteract the symptoms — at least none with high-quality studies demonstrating efficacy. Well recently, two doctors from the Multiple Sclerosis Center at Rush University Medical Center in Chicago have gotten FDA approval for such a drug, after nearly 30 years of work. For some time the drug was available only in special cases, and from specially licensed compounding pharmacies, but it is now generally available. Here’s a Chicago Tribune article about it, and you can find out more by Googling “Ampyra“:

Medication improves walking for many sufferers
(Note: it’s not obvious, but the article is split into four pages, with page-by-page links at the bottom.)

This is Big News, and it made me stop and reflect on how new treatments like this come into being.

Effective treatments don’t have to exist. There is no magical fountain of new cures, no tree that sprouts forth new medical knowledge if you just water it, and there is no cure fairy. Research-intensive drugs are incredibly difficult and expensive to develop. This one took uncommon dedication, unwavering belief, and a ton of private investment fueled by little but hope. It also took the commitment of major resources and capital by a pharmaceutical corporation, towards a project with high risks of a small or negative payout.

It’s easy to feel embittered and burdened by the high costs of advanced drugs, and it’s easy to feel a sense of entitlement to their benefits when we know they exist. With a little effort and rhetorical license, I could weave our situation into an anecdotal sob story that would make people say “somebody has to do something about this.” And let’s be honest… nobody wants to see anyone else denied medicines because they lack the means to afford them. But it’s good to stop and marvel at the fact that the medicines of today even exist. Much money and effort go into pharmaceutical research projects that result in nothing. Before anything like this comes into being, countless people have to finance an intimidatingly expensive education — one that requires formidable effort to complete — and that just gets them to the starting line. Many of them don’t succeed in getting jobs in their field of choice. After some do, lots of brilliant people need to decide to embark on research that may or may not go anywhere before anything new gets created. These people need labs, equipment, collaborators, access to others’ research, and more. They also need to pay off student loans, eat, have a roof over their heads, and have competent management to help organize their efforts effectively. All of these people have retirement plans to be funded, as well as their own healthcare plans.

When they finally do find a fruitful line of research, their discovery must go through a brutal series of safety tests and regulatory approvals. Once over that hurdle, it takes efficient manufacturing capabilities and distribution channels to get the product in our hands. Then they need to retain lawyers and secure staggering amounts of liability insurance to handle lawsuits over potential side effects — and there will be lawsuits; sometimes they’re merited, sometimes they’re not, but there’s no avoiding them either way.

It’s so easy, tempting, natural even, to feel like I have a right to any medicine, at a price within my means, simply because it exists and I need it or my family does. But what does that really mean? If I have a right to a product that I cannot afford, from whom shall I take it? Medicine is not a ubiquitous natural resource like air. It is the fruit of many other people’s efforts and investments. If this drug costs $12,000 to discover, produce, insure, and to create enough profit to incentivize its producers and to provide for investment in the next 100 research efforts, but I have only $100 to spend on it for whatever reason, do I have the right to take $11,900 away from the people who bring it into existence? If they refuse to make it for $100, can they be punished for violating my rights? If they quit and move to a more lucrative industry, can they be dragged back to manufacture the product I have a right to? What is their 30 years of effort worth, and now that we know it paid off, would it be right for us to just seize the results from them, or to tell them how much they may value those 30 years at?

If we don’t take that approach but we still insist that this product is something we have a right to, then can I take it from some arbitrary person who can afford it better than I can? And if that’s so, can someone with lesser means than me take enough from me to pay for a product I can afford and he cannot? Is it just and moral to say that Mary cannot have an advanced robotic prosthesis at all unless Sue can have free casting and X-rays when she breaks her leg? Logically, the only way I can have a right to a product — something produced by others — is either to have compulsory labor (people can take products from those who make them, and force them to make them if not enough product exists to satisfy my right), or to say that nobody can have something that everyone can’t have — the logical result of accepting the notion that the more-well-off must pay for whatever the less-well-off want or need but cannot afford. You could draw some arbitrary limit on how much property you can take away from someone, but that’s effectively saying that one’s right to income and property is subject to the ever changing whim of politicians.

Both of those paths lead directly to a world where miraculous new treatments stop coming into existence. I don’t think any rational person would argue that they lead to ever-increasing affordability of ever-more-advanced treatments, at least.

HBase vs Cassandra: why we moved (via Bits and Bytes | Dominic Williams)

Passing along an interesting post from Bits and Bytes; Dominic’s take is (in part) that the two take different approaches to Big Data: Cassandra is more amenable to online, interactive data operations while Hadoop is geared more towards data warehousing, offline index building and analytics.

My team is currently working on a brand new product – the forthcoming MMO http://www.FightMyMonster.com. This has given us the luxury of building against a NOSQL database, which means we can put the horrors of MySQL sharding and expensive scalability behind us. Recently a few people have been asking why we seem to have changed our preference from HBase to Cassandra. I can confirm the change is true and that we have in fact almost completed porting our c … Read More

via Bits and Bytes | Dominic Williams

What Would the Holy Grail of ORM Look Like?

Recent experiences and articles I’ve read have got me thinking about ORM again, and trying to conceive what the perfect one would look like (when it’s not custom matched to a specific set of patterns that I control).

The Microsoft Data Access Block was one of the first frameworks I used to make boilerplate data operations easier. Incidentally, it also led me down the evil path of exposing data access methods as static methods. I evaluated both the Entity Framework and LINQ to SQL for a large green-field project and neither were up to snuff at the time. I’ve recently migrated to Java development on Linux and gotten my fingers into Hibernate — enough to conclude that I hate it with a vengeance. Come to think of it, I’ve never seen an ORM framework that I’ve thought fully did the job, so I ended up going with the roll-your-own approach on that last project. That’s fine — maybe even superior — when you fully control the access/retrieval/update/delete patterns. But what criteria would make a new tool stand out for general adoption?

First, the short shopping list: Stored procedure support is a biggie for me, as are batch saves, client-side filtering/sorting, awareness of new/clean/dirty/delete objects (optimize wire traffic by not sending clean objects, and let me process multiple insert/update/delete operations in the same batch), intelligent awareness and automatic management of datetime properties like created/modified, and the ability to do soft deletes (set a ‘deleted’ or persistence status property, and omit those from standard fetches).

A few additional things I look for, some a little unorthodox:

Put nullability checks in get methods that return nullable collection types. I’m sick of seeing null reference exceptions when people try to render a child list that’s not populated — they’re ugly and they disrupt debugging sessions.

Let me generated extended enums (Java) or enum-type classes (.NET, bad idea to inherit from enums there) from stored data (e.g. in tables somehow flagged as being an application enum). Look at the classic Java “Planets” enum example for a use-case. This helps keep typo-prone string-based lookups out of the codebase.

Don’t push me into the entity:table paradigm. Maybe some entities are more easily used by exposing a few foreign properties on them (like names/labels that correspond to foreign keys). That facilitates much terser code and reduced IO. It’s not that hard to handle this, either; make those properties read-only and omit them from saves. Voila!

Give me smart “GetBy” parameter inference. Good candidates are primary keys, foreign keys,indexes/unique keys (including compound ones), and primary keys of other entities that have a foreign key to this. Bonus points for letting me browse the ancestor hierarchy and create GetBy methods for, e.g. grandchildren by grandparent, without having to fetch the intermediate (parent) first if I’m not going to show it. Similarly, give me delete by id and delete by instance methods.

Add “stale instance” checks to prevent overwriting more recent changes by others. (Huge bonus points if you can actually fetch the newer remote changes and merge them with the local ones when no conflicts exist.)

Provide an easily-swapped out data provider interface – don’t tie me to any specific backing store. This is a tall order, since it requires multi-way type mapping, plus decoupling and isolation of all provider-specific options and settings, and a backing-store agnostic controller layer on top of the data layer. Controllers deal with business intentions, but often must translate those into provider-specific language. This means controllers must pluggably or dynamically support data providers, without built-in knowledge of all of the types or options they use (probably via the mediator http://en.wikipedia.org/wiki/Mediator_pattern or Adapter http://en.wikipedia.org/wiki/Adapter_pattern patterns.)

Do not introduce any dependencies into POCOs/POJOs – for example, Hibernate forces its own annotations/attributes into the persistable classes, which makes them unusable in, e.g., GWT client code. Now I need to duplicate entity code in DTOs, and to create converter classes, for no other reason than to have a dependency-free clone of my entities. It’s wasteful, it promotes code bloat, and it introduces opportunity for error.

Similarly, facilitate serialization-contract injection – I’m sick of being unable to use the same entity for e.g. XML, binary, JSON and protobuf just because I need to serialize it in different ways (e.g. deep vs shallow, or using/skipping setters that contain logic). Why do my serialization preferences need to be written in stone into my entities? (Nobody does this well yet, IMO, and it’s not easy either.)

Those last two are biggies: Putting control statements into annotations/attributes is an egregious violation of SOC. Serialization, data access and RPC frameworks all want you to embed their control flags into your entity layer. Enough already! My entity layer is just that… a collection of dumb objects. Give me an imperative way to tell your framework what to do with my objects, or go home.

All code generation should be done at design time (as opposed to during build or at runtime) – for Pete’s sake, stop slowing down my builds and adding more JIT operations to my running app. (Do I need to mention that dynamically generated SQL is evil? And have you seen what ugly dynamic SQL Hibernate spits out?) Also, give me code where I can see the fetch/save/ID-generation/default-value-on-instantiation semantics without looking through 8 different files to trace it. The longer I code and the bigger the projects & teams I work on, the more I favor imperative approaches over declarative or aspect-based ones; whether I want the 3rd generation descendants to be fetched — whether lazily or eagerly — is a function of where I am in the app and what I’m doing, not of the entities themselves.

Don’t force a verbose new configuration syntax on me; use enumerations and flags that are in visible, static code, and write them with inline documentation so that explanations are visible in javadoc popups and Visual Studio mouseover tips. Pass those enum/flag values to DAO constructors and methods to control, for example, whether to re-fetch after save,what descendants to save or fetch along with the parent, etc.

Am I being too demanding? Am I missing some biggies? Programmers, let me know your thoughts!

Discarding or Rolling Back Changes in Git

When moving to Git for version control, I was amazed at how much trouble people have trying to revert a file or project to a previous state, and even more so at the variety of solutions I saw. People try (and recommend) everything from surgical to nuclear approaches to this — e.g. git checkout …, git rebase …, git revert… git stash or branch & then discard…, or even delete your entire working directory and re-clone the repository! Yet with many of these, people would still end up with unwanted changes left in their working copy! One problem is that certain commands are only appropriate for changes that have been committed to your current index, while others are for those that have not.

When I have a version I want to roll back to, I don’t like having to sort through what’s committed and what’s uncommitted; I just want to get back to that version. I’m all about finding something that works reliably and repeatedly in a way that I understand. git checkout <i>start_point</i> <i>path</i> is the “something” that seems easiest to me for reverting specific files back to specific previous states, and so far this approach has never left me with undesired changes remaining in my working copy.

Here’s the skinny…

First, get a simple list of the last few commits (7 in this example) to the file in question:

~/projects/myproj$ git log -7 --oneline src/main/java/settings/datasources.xml

Output (newest to oldest):

74106b9 Renamed PROD database
db05364 Changed root password
0d56c8b Renamed QA database
efc7eb0 Changed some hibernate mappings
97e68fe Added comments
a2c492f Fixed xml indentation
c1b0310 Wrecked xml indentation

Let’s say those last two commits were erroneous. Then using the syntax “git checkout <start_point> <path>” you would just do:

~/projects/myproj$ git checkout 0d56c8b src/main/java/settings/datasource-context.xml

All done!

Have other tricks for making “rollbacks” easier? Let me know in the comments!

Happy coding.

Commandline Fu: Find and Replace strings in files without opening them using ‘sed’

One thing you discover when moving from Windows to Linux is just how much you can accomplish from the console [aka commandline / terminal] in Linux. There are withdrawal pains at first, of course. Things seem arduous and difficult, you have to look up the the syntax of different commands over and over, and  you want your GUI back. Little by little though, it strikes you just how much time you’re saving.

Consider this scenario: You’re working on an app to automate data migration for a MySQL database, for example to update a QA database with data from the production instance. You need to extract the data as ‘INSERT’ statements, probably using another command-line tool, mysqldump. Some of that same data will already exist in the QA copy though, causing conflicts when you try load the PROD extract. Fortunately the MySQL developers thought of this sort of thing and provided a ‘REPLACE INTO’ command; it works just like ‘INSERT INTO’ except that it updates any data that already exists in the destination instead of trying to insert it again. However, mysqldump writes out ‘INSERT’ statements, not ‘REPLACE’ statements.

Enter the ‘sed‘ command in bash. sed is a stream editor for filtering and transforming text. Using sed in conjunction with mysqldump and bash’s powerful piping and redirection capabilities, you can do all 3 of these things in one fell swoop:

  1. Use mysqldump to extract your data in a format that’s easily loaded into another database;
  2. Find every occurrence of the phrase ‘INSERT INTO’ in the extract and replace it with ‘REPLACE INTO’ using sed;
  3. Redirect the modified output from sed into a file (it would normally go to the screen, which is probably less than useful).

Using these commands you can do this all with one line of text at the command prompt (ignore wrapping and type on a single line):

$ mysqldump --raw=true --skip-opt --column-names=false -n -t -e -c --hex-blob
  | sed -e 's/INSERT INTO/REPLACE INTO/g' > data_extract.sql;

Pretty cool, huh?

Here’s what’s going on. First, mysqldump extracts the data (I’ll explain all the switches further down). Next, bash’s pipe operator ( “|” ) tells the command interpreter to send the output of the preceding command to another program before displaying it on the console. We sent it to sed, and gave sed an expression telling it to replace every ‘INSERT INTO’ occurrence with ‘REPLACE INTO’. Lastly, bash’s redirect operator ( “>” ) sends the output of everything leading up to it into a file named data_extract.sql instead of showing it on the screen. Voilà! You have a file you can import conflict-free into your QA database.

Using ‘-e’ with sed means an expression will immediately follow. The pattern for find-and-replace expressions is ‘s/pattern/replacement/[flags]’. We used ‘/g’ for flags, which means replace all occurrences of pattern with replacement. (See here for a more in-depth tutorial on sed.)

Lastly, here’s a bit of explanation on what all those arguments to mysqldump were all about. mysqldump can extract a database’s structure, data, or both. You control the specifics with arguments, some examples being:

# -c = complete insert (insert using explicit column names)
# -d = nodata
# -e = extended inserts (multiple rows per INSERT statement,
       instead of one by one INSERTs)
# -n = --no-create-db - don't create db in destination
       (i.e. use existing)
# -t = --no-create-info = skip create table statements
# -p = ask for password, -psecret = --password=secret
# --skipt-opt: see below, gets rid of MyISAM only "diable keys"
        statements
#		(ALWAYS put BEFORE -c and -e!!!)
# --skip-triggers
# -q = quick stream, don't buffer entire dataset (good for large tables)
# -uroot = switch to root user
# --hex-blob = convert binary to 0xHEX notation
	same format as:
        select CONCAT('0x', HEX(UNHEX(REPLACE(UUID(), '-', ''))));
#  --single-transaction is a much better option than locking for InnoDB,
	because it does not need to lock the tables at all.
	To dump big tables, you should combine this option with --quick.
# --opt, --skip-opt (PUT BEFORE -c and -e)
	This option is shorthand; it is the same as specifying
	--add-drop-table --add-locks --create-options --disable-keys
	--extended-insert --lock-tables --quick --set-charset. It should
	give you a fast dump operation and produce a dump file that can be
	reloaded into a MySQL server quickly.  As of MySQL 4.1, --opt is on
	by default, but can be disabled with --skip-opt. To disable only
	certain of the options enabled by --opt, use their --skip forms; for
	example, --skip-add-drop-table or --skip-quick.

Happy terminals!

Finally, a Linux alternative for Jing and Screencast.com!

I finally found a reasonably complete Linux replacement for Jing — at least for still caps. A few not-too-difficult setup steps and you get easy hotkey based rectangular screenshots with two-click short-URL uploads.

Try it… http://shutter-project.org/ for more info, but here’s the Shutter Quickstart for Kubuntu (will vary for Gnome users):

$ sudo add-apt-repository ppa:shutter/ppa
$ sudo apt-get update && sudo apt-get install shutter
$ sudo apt-get install shutter

Run the Shutter app, dink with preferences as you see fit, then…

Gnome: Shutter preferences can set your keybindings
KDE: K menu -> System Settings -> Shortcuts and Gestures -> Custom Shortcuts

  • Right click “Preset Actions” -> New -> Global Shortcut -> Command URL
  • Trigger = your key combo preference (I used Ctrl+Shift+J since it’s the same as Jing)
  • Action = shutter -s (for Selection based capture, i.e. rectangular region, RTFM if you want a different default)

then…

  • Create a Ubuntu One account at https://one.ubuntu.com/
  • Install the Ubuntu One client (available in KPackage Manager)
  • Run the ” ” and enter your account details
  • Find the tab with the “Connect” button, click it, tell it to share/sync files at least

now you’re ready…

  • Take a Shutter screen cap using your previously configured hotkey
  • Right click the image in the Shutter window that follows, select Export
  • Select the Ubuntu One tab
  • “Choose folder dropdown” -> “other…” ->
  • navigate to ~/Ubuntu One/
  • Create ‘img’ or ‘pic’ or ‘My Beautiful Digital Pictures’ or whatever you want to call your shared pics directory
  • Save in that folder

The upload will happen automatically*, and when complete a short URL will be on your clipboard (you’ll get a toaster message).

*as it will with any content placed underneath ‘~/Ubuntu One’

There are other sharing options available, but the configuration for them in Shutter is still rough around the edges (to be polite).