ihumanable: code

Showing posts with label code. Show all posts

20101022

freelance

[caption id="attachment_875" align="alignright" width="300" caption="I\'m not sure what this has to do with Freelancing, thanks Google!"]

[/caption]

I've been hard at work making sure that Faveroo keeps humming along, pushing out new features here and there and making sure everyone is happy. The growth we've seen has been steady and we are continuing to push forward. Work has been a little hectic lately, if anyone ever tells you that growing a product and a brand is easy, you just punch them right in the face, I don't care that it's your Grandma, do what I say, I'm a blogger. Anyways this blog post isn't about my 9-5 it's about my 6-10.

I do a lot of cool side projects, some of them like Prosper are just for fun, they scratch an itch I have and I think I might be able to use them later. Some never see the light of day, they lose momentum or turn into something that I can't figure out how to make any money or help anyone with so they just languish on my hard drive. Lately I've been trying to break into the freelancing stuff, I thought, how hard could this be, I have years of experience, I'm willing to work for a reasonable price, I write high quality code really fast, and I tend to think that I'm a fairly nice person to work with. Well its been an uphill battle so far, but I want to share my thoughts with you.

Online freelance boards just suck. They do, I hate you, I hate them, hate is really the only emotion I can feel about these boards. 90% of the posts are for ridiculously ill-defined or impossible jobs, 99.999% of the replies are someone in some third world country willing to write the new facebook for a shiny new nickel, and there is no way to get the point across to anyone that you get what you pay for. I've yet to find any useful paying work from these sites.

Craigslist is a mixed bag so far. When I started on this journey I got some help from a friend who basically just spammed me with every blog post they could find on the subject of freelance consulting. A nice thread on Hacker News, here's a Two part series by Ryan Waggoner on freelancing. One of the pieces of advice was to use Craigslist, so I tried that, there is a ridiculous volume of stuff in the Computer Gigs section, and since you aren't constrained by geography you can apply to a huge amount of them. I tried this for a few weeks but after sending out tens of emails, most custom written specifically for the project and only receiving one reply, I decided to try another avenue. I'm not giving up on the Craigslist approach yet, if for no other reason then the absolute mass of postings there.

Friends of friends. Right now this is where I've seen my only success. I'm currently working on a project for a friend of a friend and so far things are looking great. I'm enjoying the work, sweet lady PHP with my favorite unframework, Flourish, and I'm making great progress. It's been a good project to work on so far, the client has been great to work with, and the code has been flowing. I'm sure there's a downside to to this course, friends of friends will want you to do extra work for them for free or change things up as a solid, but so far so good.

Future plans. This blog post is part of it, but basically some self-promotion is in order. I'm going to be putting together an online portfolio of stuff I've done, Prosper is a nice large code base for someone to look at to see the quality of my PHP code. I realized I just need to get my name out there as an option, the current project I got was because I happened to tell my friend I'm trying to start consulting on the side, bam, first paying gig. I'm going to keep fighting the good fight on Craigslist, maybe come up with a more efficient system for submitting my name into consideration for postings.

If you need a good PHP programmer drop me a line in the comments or send an email to ihumanable@gmail.com and maybe I can write some awesome code for you.

20100831

2 months

[caption id="attachment_864" align="aligncenter" width="590" caption="that\'s a pretty sweet logo"]

[/caption]

On June 3rd I typed the following into my Terminal.


mkdir faveroo
cd faveroo
git init

On August 15th I typed the following into my Terminal.


cd ~/src/faveroo
php deploy.php production

Then I watched and held my breath as my carefully crafted deployment script worked its way through 2 months of code and pushed them across the line. Then I fired up my browser and went to http://faveroo.com and waited for the now familiar homepage to show up... I'm sure it took less than a second to load, but it seemed like an eternity, finally it was up. I worked for the next few hours exercising the application through all its various operations, squishing little bugs here and there that popped up from being in a different environment. Two months of development, and finally it was live, and it actually worked. A few hours later we started hooking up people with sweet deals and putting money in the bank, mission accomplished.

Faveroo.com is my first project at my new job at 614 Media Group. It was a skunkworks project up through the end of July, so I couldn't come to this blog and talk about it. I did all of the coding for Faveroo and my colleague Jeff Guciardo did all the design work to make it look so pretty (it's probably the prettiest thing I've ever had the opportunity to create). The basic premise of the website is that our team of dedicated sales people go and find great local deals, then we throw them up on Faveroo.com, and then we help people save some cash. When you buy a Faveroo you save some money, the business makes money, we make money, and 3% of the sale automatically goes to charity, it's win-win-win-win. But I'm neither marketing nor sales, so I will stick with what I know and what I knows is tech, so let's talk about that for a minute.

Faveroo is a PHP website that includes the public frontend, a private backend, and a series of maintenance scripts that make sure everything works like clockwork. When I was starting up the Faveroo project I was given carte blanche as to how to build it. All of our other web properties use the classic LAMP stack, so to keep operations fairly sane and because I deeply love PHP, I decided to build out Faveroo on a classic LAMP stack as well. The code is Object Oriented, nearly MVC, PHP 5.2 code. I looked around, had been for a long time, at various PHP Web Frameworks. I had just come off of working with rails on a side project and so I knew the joy and frustration of work with a framework.

As you may be aware, I'm crazy in love with Flourish and decided that I would use it as a major component. I have been a fan of Flourish for a while now, probably over a year, but this was the first big application I was going to use it on, and really the first large scale from scratch application I have ever written. Now don't get me wrong, I'm no rookie to this slinging code stuff, I've maintained huge legacy applications, built middle-ware up from nothing, and even rewritten large applications to the point that almost all of the original code has been replaced. But this would be the first time that if I didn't like something in my code, it was because I was the stupid jack-ass that made such a boneheaded decision. Well, not the first time, but the first time I couldn't easily blame someone else ;)

I want to say that the decision to go with Flourish is probably what made the rapid turn around time possible. It's a wonderful library that helps you do things right but doesn't force you to do anything you don't need. The thing that amazed me as I used it is I started off only wanting to pull bits and pieces, maybe some fCryptography here and a little fMessaging there, but as I got familiar (and boy howdy did I get familiar) with the docs, I just found more and more amazing functionality and a beautifully coherent system. Flourish was clearly written to be used, and by the end of the project I found I was using almost every class. It's clearly a tool written for real world use from real world experience.

Flourish and jQuery are really the 2 big external dependencies for Faveroo, the rest of the code was written by me. I found this minimal system worked very well for me. I wrote a router in about 100 lines of PHP code, it's nothing earth shattering but I think it has a few novel ideas. I've since built a new version of this router that is shorter and has less complex code paths. At some point in the future I may try to make a more generic version of this router and release it open source. All of the model classes are fairly straightforward using the excellent fActiveRecord as a base.

I spent about a week spinning up my minimalist framework, but it paid off big. I knew how the system worked every step of the way, and more importantly, I could alter it in minor ways to achieve dramatic results. All of this is possible with established frameworks, but here I got away without having to climb up a learning curve. This gave me more time to understand the problem domain and to learn how to best use Flourish.

With this experience under my belt I'm looking forward to continuing to learn, and hopefully to contribute back to, Flourish and PHP development in general. This project has shown me that in 2 months I can go from nothing to cash hitting the bank account. I feel reinvigorated to make cool things that add value to people's lives and reassured of my abilities as a programmer. After a long period feeling burned out and wasted, I remember why I love programming, and why I love creating, and why this is the only thing I can ever see myself doing.

20100331

micro-optimization

[caption id="attachment_818" align="alignright" width="277" caption="Yea, I know it\'s on fire, but check it, I almost got Freebird down pat."] nero fiddling as rome burns

[/caption]

As all good developers should know, one of the cardinal sins of software development is premature optimization. Premature optimization is bad for a whole host of reasons. But there is another set of optimizations that are in the same realm of bad, micro-optimizations.

I recall learning at some point that in C and C++ it is more efficient to use the pre-increment operator rather than the post-increment operator. Why is this the case, well it's because of the minor difference in behavior between the following two snippets.

[cpp]
int source = 10;
int destination = 0;

destination = ++source;

printf("Destination: %d", destination); //Prints "Destination: 11"
[/cpp]

Compare with this snippet

[cpp]
int source = 10;
int destination = 0;

destination = source++;

printf("Destination: %d", destination); //Prints "Destination: 10"
[/cpp]

WHA!? This is actually exactly what we would expect, pre-increment increments the value BEFORE assignment while post-increment increments the value AFTER assignment. What does this all mean, well basically that if you use post-increment you are executing more instructions, because the compiler has to keep the old value around to return to the assignment. That means that unless you are assigning the value and require the particular behavior that post-increment gives you, you can generate faster code by using the pre-increment operator (this may be handled by compiler optimizations these days).

All of the for-loops that you've written for(int i = 0; i < something; i++) are identical to and SLOWER than for(int i = 0; i < something; ++i). NOOOOO!!!!! Don't worry though, because this is a micro-optimization, its something to not care about, because in reality that one or two extra machine instructions isn't your bottleneck. Micro-optimizations are all those tricks that make code faster that in the vast majority of cases (although not all cases) don't really amount to any actual noticeable performance gain. A new processor can do something in the magnitude of 100,000 MIPS (Millions of Instructions Per Second). Every second it does 100,000,000,000 Instructions, that is 100 billion instructions. every. single. second. Changing from post- to pre- increment saves 1 or 2 instructions, so for every time that gets executed you have saved a 100th of a nanosecond.

Micro-optimizations are rarely worth the hassle, and, as with premature optimization, the solution is to benchmark and trace where your actual bottlenecks are and go about solving those. It doesn't matter if you're screaming through your increments saving a whole nanosecond if your crappy no index having table is taking a whole minute to read because you are doing a table-scan.

But wait, there's more. Micro-optimization, like everything in programming, can be directly applied to life in general. I work with a person who will spend 5 minutes discussing how to save 30 seconds, this is another incarnation of the micro-optimization. It's that shortcut you take home that actually takes longer, or some everyday voodoo that is draining your time without paying you back, or that client that provides you with 2% of your income but eats up 40% of your effort. Micro-optimizations abound all around us, in the little picture they seem ok, but in the big picture they are nonsense. Take a fresh look at the things and obligations around you and examine them for micro-optimizations, see if the things you do make sense in the big picture.

20100211

password usability

A friend pointed me to an interesting article on A List Apart. If you don't want to read through the whole thing it basically says that a lot of password resets are caused by people remembering their passwords correctly but mistyping them. The concerns that once made masking passwords with asterisks are now being eclipsed by the usability problems this design has introduced. The post goes on to describe two potential alternatives, a toggle to show the password in plaintext (similar to the WiFi configuration screen in Windows) or to show the last character typed while masking the rest (similar to the iPhone or Android password inputs).

Both of these options are interesting and I personally would like to see either one gain greater acceptance, although with the rise of password managers built into Operating Systems and Web Browsers it seems less and less necessary. The problem with both of these techniques is discussed in the article, that changing the functionality of the password input undermines the user's confidence in your site's security. This is why I think that changing the nature of password inputs is dubious at best until it gains widespread adoption, maybe if Google were to implement them or some other web giant. Until that day I think a fine alternative would be Mattt Thompson's Chroma-Hash.

Chroma-Hash augments a password input with extra information. Something that is easy to remember and easy to notice when its wrong, a color swatch called the Chroma-Hash. Let's take a look at how it works.

[caption id="attachment_722" align="aligncenter" width="461" caption="The password (conveniently enough 'password') generates the colorful hash to the right."]

[/caption]

The passwords match because the colors match, when entering your password you are informed of mistypings immediately by the hash being incorrect. Let's take a look at what happens if we carelessly fat-finger the confirmation typing "passworf" instead of "password" like it should be.

[caption id="attachment_723" align="aligncenter" width="465" caption="One little letter completely changes the Chroma-Hash, immediate feedback"]

[/caption]

Small changes in the password generate big changes in the Chroma-Hash. The human brain is one of the best pattern matching engines in the world, Chroma-Hash leverages this fact. Very small changes in a sites design or color scheme are detectable, that's why people make a big deal when a site they commonly visit changes things, even slightly. This makes Chroma-Hash ideal for serving as a "password proxy." Others can see the Chroma-Hash and gain no information about your password and yet it instantly gives you a wealth of feedback about whether or not you have entered the correct password.

Take a look at Chroma-Hash, fork it on GitHub, implement it on your website. You get the advantage of recognizable feedback without needing to change the fundamental way in which the password input works.

20100209

time

[caption id="attachment_708" align="alignright" width="300" caption="Finally was able to get the clock to show me its good side, dirty girl"]

[/caption]

Don't worry this won't be a rant about mortality or getting things done or any of the philosophy that has been dominating this blog as of late. This is back to basics, a discussion about software and a particularly tricky aspect of it, time. Not time as in, scheduling and managing time, but something far more fundamental representing time. It is an insidiously tricky problem and one that can be quite difficult to wrap your head around.

The problem comes from how we think about time as people living normal lives. "I'll meet you at 3 o'clock" is a rather dull and completely normal type of phrase to say to someone. As two normal people living normal lives, this simple phrase "3 o'clock" is plenty to convey when they should meet. This is because there exists a great deal of unspoken context between the two parties, if we are meeting for a business meeting I clearly mean 3:00pm not 3:00am. If we both work in the same building I probably mean relative to the our shared timezone, 3:00pm EST not 3:00pm GMT. There is a world of shared unspoken context that makes human-human time discussions easy and natural.

Computers are really stupid though, they need everything spelled out. If you were trying to store time you might take a naive approach at first and just store the string "3:00" maybe if you are really thinking it out you would store "3:00pm EST." This method soon starts showing its weaknesses as its hard to compare times, or perform operations on them. How many hours are between 2:00am EST and 5:30pm CST? There is a nasty problem to try to solve unless you have some sort of way to represent times in the abstract.

In steps a number of formats to represent time. There is the venerable Unix Timestamp which is the number of seconds from Jan. 1, 1970 as of my current writing it stands at 1265738039, but feel free to check for yourself. Then there are numerous proprietary formats like Microsoft's, Oracle's, etc. These all allow you to represent an exact moment of time in a portable abstract way with no dependence on the cavalcade of context us fleshy humans share.

Well problem solved, just bust out your favorite abstract representation and you are done. Not so fast, there are many other considerations to take into account when dealing with time. There are of course the tricky problems of Daylight's Saving Time, leap years, and the like. Imagine you are trying to add an event to a calendar system everyday at 5:00pm EST, you think you could just add it to today and then just add 24 hours and create a new event. DST hits your algorithm over the head at some point and everything is off an hour, oh no! Also now you have a ton of data to represent one basic fact, something happens everyday at 5:00pm EST. Its only one fact, you should need one record, not an infinite number of 5:00pm EST records. This hints at the next difficulty of time.

Humans think about repeatable events (sometimes with complex and insane rules) as commonplace and easy. This thing happens on the third Thursday of every month, unless of course Monday was a holiday and then it gets shifted to Friday. The problem with time and dates and repeating events is that human beings erected a ton of complex processing rules before they realized we were going to try and digitize them. These are difficult to represent and difficult to get right.

At first the task of representing arbitrary points and spans of time seems fairly straightforward, but it is a complex and nuanced task, like most things the devil is in the details. Before you go off half-cocked building up your own representation, take a look at some established formats, like Unix Timestamps, RFC5545, and ISO 8601.

20100201

open format

[caption id="attachment_663" align="alignright" width="300" caption="We\'ll just lock your data up in here, don\'t worry we\'ll open it up later if you need it. This is what a closed format sounds like"] bank safe

[/caption]

Back in the days when a computer had 64k of RAM and high-capacity meant you had a double-sided floppy disc, there were these funny little things called binary file formats. We still have them today, they are the "pattern" that a program will write and read to and from disc to save and load a file into a program. Some are open, like image file formats, and some are closed, like Microsoft's binary blob formats for Office. As the world has advanced and storage has become dirt cheap people started looking for an easier way, and whenever there is a problem that we don't have a tool for yet, we reached for XML.

XML is actually not a bad fit for this problem domain, its a little on the ugly side, but that's ok, very few people crack open a raw file to read it. The people that are cracking open files in a text editor to peer inside are probably tech-savvy enough to read a little XML. The big bonus of switching to XML, like ODF or DOCX, is that there are very mature XML parsers and writers for almost every programming language. That means that a determined programmer can grab a copy of your format spec, her favorite XML parser, and write a program that operates on your file format. This is the essence of that oh-so-marketing-speak word synergy.

Now I would love to take credit for this awesome idea, but if you've read The Art of Unix Programming you will recognize this as nothing more than an extension of the Rule of Composition. Now that you've read the Rule of Composition (and you should make the time to read the rest of the Art of Unix Programming, even if you never plan on programming anything for Unix, the lessons within are just in general great to know), you will recognize the inherent advantage of having parsable file formats. Now that I have cunningly converted you over to my way of thinking, what format should you use?

XML

XML (eXtensible Markup Language) is the old standby, it is reliable, standardized by the W3C, well-supported, and there are tons of tools around for playing with XML. XML is "human-readable" for those of you unfamiliar with it here is an example.

[xml]
<book>
<title>Example Book</title>
<pages>
<page>
<![CDATA[
...page stuff here...
]]>
</page>
<page>
...you get the idea...
</page>
</pages>
</book>
[/xml]

Its a bit tag-heavy and that means a slightly large file size, this isn't a huge concern since storage is so cheap, but you should be aware of it. XML has a neat feature called DTD (Document Type Definition), armed with this most parsers will be able to tell right away if a document is well formed. XML is big and clunky but a fine choice for many applications, especially if you need typing information.

YAML

YAML (YAML Ain't Markup Language) is the format of choice for the ruby community. YAML is well supported by most mainstream programming languages, it is a more lightweight choice than XML. Here is the same thing as above in YAML.

[ruby]
book: Example Book
pages:
- page: >
...here goes my page data...
- page: >
...you get the idea....
[/ruby]

YAML uses the structure of the text to indicate the structure of the data. Ending tags are dropped and indentation becomes more important. YAML looks simplistic at first but has a wide-array of functionality hiding below the simple hello world examples. References, hashes, arrays, and much more are possible with YAML. The specification allows you to make concise documents that contain an astounding amount of information.

JSON

JSON (JavaScript Object Notation) is a lightweight way to represent data structures. JSON excels by being incredibly simple to learn and use. There is native support for it in JavaScript which makes it ideal for use in AJAX (which would then technically by called AJAJ), and there are JSON parser available in most mainstream languages. Here is the example from above in JSON.

[javascript]
{title: "Example Book", pages: [ page: "...page stuff goes here...", page: "...you get the idea..." ] };
[/javascript]

Just like in JavaScript everything in JSON is a Hash or Array. JSON is a simple typeless data recording system, perfect for dynamic languages. JSON is a proper subset of YAML 1.2 so most YAML parsers can also parse JSON. JSON's incredibly lightweight nature lends itself for being used when sending data over a wire or when space is at a premium.

BSON

BSON (Binary JavaScript Object Notation) is a binary form of JSON. It is meant to be more space efficient than JSON but maintain the ease of parsing. It is described as "A General-Purpose Data Format" and was devised as the serialization medium for the MongoDB project. It is an interesting format to look at if space is at a premium and there are already parsers for C, C++, PHP, Python, and Ruby.

File formats no longer need to be gnarled, tangled messes. We have new standards that allow us to freely share data created in one program to be analyzed, manipulated, and turned into something new in another program. Being able to freely interoperate is the future of computing, it is the driving force behind web standardizations, micro formats, and open file formats. The next time you need to persist data to the file system, resist the urge to roll your own serialization scheme, and check out one of the technologies presented above. The world will thank you.

20100129

learn autoit

[caption id="attachment_655" align="alignright" width="300" caption="Bezels and shadows and reflections, oh my"] autoit

[/caption]

If you use windows at work or at play this post is for you, macheads and linux people I'm sorry, this post probably won't help you much. I have had the joy of doing documentation work for the last 2-3 months at work. Gnarly, ugly documentation that serves little purpose other than making someone feel happy that there are reams and reams of documentation. At some point, probably around the 5th document, I made the realization that out of 20 pages it was mostly boilerplate, with about 8 things peppered throughout that changed. I decided that it would be easiest to make a template document and then just do a find and replace on those eight symbols [[DOCNUM]] [[DOCNAME]] [[DOCTRANSACTION]] etc.

Productivity shot up, I could now stub out a document in a few minutes instead of the hour or so that it used to take to manually hunt through and change things. All was well and after a day or two of find / replacing I had stubbed out the 80 or so documents. Then came the meeting where it was decided that 2 new sections would be needed and that 8 sections could be removed as being redundant. This victory for common sense was also a giant headache, as I didn't really look forward to restubbing 80 documents. There had to be an easier way, a better way, I remembered a screen driver tool I had some exposure to at a past engagement, AutoIt.

After reading through the documentation and goofing around with it for a while yesterday I now have a fully functional script that will automatically stub out all of the documentation with the click of a button. A task that used to be an error-prone, manually intensive process now requires no intervention. We can restub on demand now, so changing the template is no problem.

The Good

Saves a ton of time

Removes the human aspect from an existing process

Centralizes specific knowledge

Easy to write and test

The new script saves a ton of time, I can regenerate all documentation in about 10 minutes, so I click a button and go walk around the office bugging people. AutoIt simply takes a known working process and steps in to be the person driving it, I didn't have to take time dreaming up a brand new workflow, I could focus on just getting it done. The script is now the system of record for the specific things that change from document to document, which is nice for trying to determine at a glance what makes an HCR1 different from a ARC6 command, without digging through 40 pages of documentation. AutoIt also utilizes a stripped down version of SciTE with auto-completion and built in support for compiling and running, which makes writing and testing the script a breeze.

The Bad

Ugly syntax (like VBScript and PHP had a bastard child)

Oddly organized documentation and variably helpful community

Inherently brittle scripts

Still slower than I'd like

Foreground processing

AutoIt has an ugly syntax (think VBScript), but it has all the pieces parts to make a nice script, functions and variables. The documentation takes a little getting used to, there is plenty in there, but it could be organized better. AutoIt depends on things like the window title and absolute paths, so it is inherently brittle, I doubt this script would run unaltered on someone else's machine. This could just be me being a n00b at AutoIt but I followed the practices laid out in the tutorials and the documentation. The other bad part about AutoIt is that it drives your screen, it simulates you pressing buttons and mousing about, so the script is slow and you can't interact with the machine while its running or you will probably mess everything up.

Alternatives

After proclaiming my victory over the documentation monster, I got some replies from colleagues asking why I didn't just make a powershell or ruby program or java program or something. I could have cracked open OpenWriter or something and attempted to build some massive program that could create .docx files, but that would have taken a ton of time. The AutoIt solution was incredibly quick to produce, took about 2 hours of playing around with it. There were a bunch of side benefits that the alternatives wouldn't have had. The template file is just a plain ordinary .docx file with all kinds of crazy formatting and images, instead of trying to figure out how to reproduce this in code, I can use Word to do the heavy lifting for me. This allows business users to change the template and we can rapidly restub and get back up and running.

Conclusion

You should go and learn AutoIt or AppleScript or whatever GUI driver program is available for your platform. It is not the tool for every job, but it's perfect for some jobs. Being the person on your team that can take some boring, error-prone, laborious task and turn it into a turn-crank task is a great thing. You can save your team time, get back to doing actual work more quickly, and make a name for yourself as a problem solver.

20100126

when in rome

[caption id="attachment_624" align="alignright" width="300" caption="Trust me, you want to do what the Romans do, I mean... have you ever built a Collisium?"]

[/caption]

Ah the cliché of it all, "When in Rome." Although it would be nice to have a post about being in or traveling to Rome, sadly this is not the case. Instead, this post is about following convention and how that can help you learn.

I've been getting into ruby as of late, after learning Smalltalk, Objective-C, and LISP (to varying degrees), ruby hits a nice sweet spot of stealing some of the best bits from each. I was recently approached by a friend who does primarily .Net programming about the best way to go about learning ruby. He had decided that, at least in the beginning, he should use the NetBeans IDE. This seems harmless enough, even logical for someone who spends their entire day in Visual Studio to want the comforting guidance of the IDE. It is also the exact wrong approach.

As he progressed past simple "Hello World" examples and onto his first rails project he found himself battling with the IDE. Odd error messages, the IDE struggling to provide auto-completion on the dynamic models, but most importantly a complete lack of resources. The ruby and rails world to a large degree lives and breathes on the command line, right now if you go to my home computer you will find 4 terminal windows arranged with a mysql shell, script/console, script/server, and one for git. If something goes wrong and I get a weird error, I do what all great programmers do, copy and paste that sucker into Google. I normally find somewhere around a bajillion hits with solutions, huzzah!

My friend on the other hand had trouble doing the simple things through the IDE, like installing gems. I'm certain the IDE was capable of it, and that someone better versed in it would be able to help him out, but the Google-scape was barren to his questioning. All I could say to answer his questions were, "well I'm not sure how NetBeans does it, but on the command line just type gem install [gem]" (Also I can speak in teletype, I just do a robot voice.)

Despite the difficulties, my friend clung to the belief that the IDE was the way to go for beginners. I finally realized the perfect analogy to drive home the point, I asked, "Would you ever advise someone new to .Net to learn it by using the command line tools?" It's a perfectly valid question, I'm sure plenty of people who hang out in vim all day and don't mind doing things like ./configure && make && sudo make install (thanks Jeremiah for pointing out my n00bishness) would feel much more at home on the command line.

I am not free of fault on this either, when I attempted to learn git for the first time, I used TortoiseGit. I was very comfortable with TortoiseSVN and thought it would ease the transition. It sure did, I was able to treat git exactly like svn, and completely miss the point! Once I moved to the command line (and watched some screencasts and read some articles) I felt much more at home with git, and even began to understand why I would want to use it over svn. I had stopped trying to make it something it isn't and embraced the thing that it is.

The point here is that when you learn something new, it's new, don't bring along your technological baggage. If the community all rallies around some IDE (.Net, I'm looking at you) then for flying spaghetti monster's sake, use that IDE. If they rally around the command line (ruby, I'm looking at you) then by the hammer of Thor, use the command line. If they rally around some weird virtual environment machine (smalltalk, I'm looking at you) then have fun and realize you are never going to be able to use that at work (just poking fun, I <3 smalltalk).

The point here is that learning something new is often times uncomfortable, you have to feel like a tourist for a while. When in Rome though, do as the Romans do. Embrace the methodologies, naming conventions, and tools that are the community standard. Give it a few weeks, if you still find yourself hating it, then maybe that particular technology isn't for you, it's ok, we can't all love everything. Life is about trying new things and figuring out what makes you want to get out of bed in the morning, for some people it will be .Net, for some ruby, for some COBOL (just kidding, no one likes COBOL).

You never do yourself any favors by half-way learning something, because you will either hate it because it was poorly shoehorned into an inappropriate paradigm, or you will learn to love it. "That second thing doesn't sound too bad" (I can sense your thoughts), and at first blush its not, until you want to start participating with the community by using others' code and sharing your code. Then you will have to unlearn all the bad practices you've adopted and have a difficult transition into the community, attempting to erase months of muscle memory. Save yourself the difficult task of trying to unlearn something and simply embrace the technology in full to begin with. It's a steeper curve, but the payout is the depth of understanding.

20100125

api

[caption id="attachment_618" align="alignright" width="142" caption="oh creator of hot pockets, we praise thee!"] samsung microwave

[/caption]

I remember being a young lad preparing myself for university I was given a gift from my mother, "C++ for dummies." The vote of confidence on my status as a "dummy" aside, I read the book with great interest. There was an analogy the author used to explain the idea of classes and what functions they should expose, I'm going to shamelessly steal it (paraphrasing as I don't have the book with me).

Imagine your son comes up to you and says he wants to make some nachos. You tell him that its fine by you, just go cook them in the microwave, and there is nothing controversial about this statement. Microwaves are actually boxes full of high energy radiation being produced by cavity magnetrons or waveguides, to the end user, the microwave is just a magic warming box. It exposes a very simple interface, some buttons that allow you to enter the amount of time you want to cook something.

This is the essence of an API (Application Programming Interface), wrapping up something complex and possibly dangerous in something safe and easy to interact with. When building code that you intend other people to use someday, it is the part that is most important part, and the part that is easiest to overlook. The problem is that we are often too close to something and too concerned with our use case. If you want to design code for others to use, it requires significant time and effort, and even then you probably still won't get it right.

Prosper is still undergoing active development, I'm currently agonizing over how I want to expose various execution modes. The solution, no matter what I pick, is trivial to implement, but the api is the most important part. A great api exposes a consistent concept, something that is easily grasped and allows the end user of the api to declare what they want to do without having to worry about how its going to get done. Since good programmers write good code and great programmers steal great code, I've modeled the api for prosper extensively off of jQuery. And why not, let's take a look at two different APIs, the browser dom api and jquery.

[javascript]
//Let's barber pole a list by coloring every other element blue
var list = document.getElementById('the_list');
var highlight = false;
for(var i = 0; i < list.children.length; i++) {
if(highlight) {
list.children[i].style['backgroundColor'] = '#FF0000';
}
highlight = !highlight;
}
[/javascript]

Fairly straightforward implementation, but it concerns itself heavily with the "how" of what its doing. Manually traversing to pick the elements it wants, eww.

[javascript]
//Same thing using jquery
$("ul#the_list li:odd").css("background-color", "#FF0000");
[/javascript]

jQuery is definitely magic, but this code is great because it let's you focus on the "what" of what you are doing. How does jQuery go about selecting the right elements and all that? I don't care, and the great thing is I don't have to care, and if in the next version of jQuery they find a way to do it faster, I win without having to do anything.

Writing a great api is difficult, you have to divorce yourself from the concrete problem you are solving and look at it in the abstract. Put yourself into the shoes of someone trying to figure out how the api works, and then ask the questions they are going to ask.

Why do I need to build up this context thing and pass it in?

How come there isn't a sensible default for these arguments?

What dumbass made this thing?

Answer those questions, and keep working at it, strive for elegance and consistency, because then it will be easy for people to learn and use. If your code is easy to learn and use, people are going to want to use it more, and they are going to want to tell their friends about it. Then you can get some lucrative ad campaigns with Nike because of the boffo library you write in FoxPro.

There is a more subtle lesson in all of this though. Any code you write is exposing an api to someone else. "But only I am ever going to use this code!" I hear the naysayers warming up their keyboards for the comments section. This may be true now, but the six-months-from-now-you is going to look back at the you-of-today and wonder what the hell he was thinking.

Get in the habit of making your code easy to use, and expose a nice api. This will endear you to your fellow programmers and help make maintenance easy. Strive to be that guy on the team that writes functions and classes people want to use. Make life easier for your fellow developers and even if they don't return the favor, maybe they will take you out for a beer. People notice over time that your code is the best to work with, they internalize it, and they start to think of you as a great programmer. That's a pretty great api to expose to the world.

20100111

grok

[caption id="attachment_565" align="alignright" width="282" caption="necessary part of grokking something"]

[/caption]

Computer nerds should be familiar with the term grok while the rest of the populace has no need for such a silly word. You may hear it down at your favorite nerdery, "I don't quite grok this" or "Once I grokked it, I flew through the project." What does this word mean though and where did it come from.

Grok for the non-nerd means, "to understand thoroughly and intuitively." Although this is true it belies what it truly means to grok something. For that we need to realize the origin of this fantastic geek speak, this term is straight out of Robert A. Heinlein's Stranger in a Strange Land. He defined it as follows

Grok means to understand so thoroughly that the observer becomes a part of the observed—to merge, blend, intermarry, lose identity in group experience. It means almost everything that we mean by religion, philosophy, and science—and it means as little to us (because of our Earthly assumptions) as color means to a blind man.

The great thing about grokking something is that it changes your world view, it fundamentally shifts the way that you see, perceive, and understand the world. The bad thing about grokking something is that it is incredibly difficult to get to that point. Let's take a look at a fantastic chart I've whipped up to help us understand this better.

[caption id="attachment_566" align="alignleft" width="300" caption="the graph of grokitude"]

[/caption]

In this graph we have time and effort along the horizontal and understanding along the vertical. We can spend a great deal of time after "mastering" a subject before we get the epiphany and truly grok something. To really understand a subject on the level that a computer nerd would consider grokking it takes not only a full in-depth knowledge of the function of something, but an understanding of the design, an internalization of the principals, and a familiarity that borders on oneness.

This is the reward for hundreds or thousands of man-hours spent toiling in something, the day when the clouds part and you truly understand something, this has an amazing non-linear effect on your understanding and your productivity. Once you grok something, be it a framework or a library or a concept, your ability to use that and your productivity in it increase in a non-linear fashion.

Although it is important to have a wide breadth of knowledge we must remember the importance of having a few areas of incredible depths of knowledge. These areas will permeate everything else you do, so try to choose wisely. The subjects that you truly grok are the ones that will effect your big picture outlook.

Here are my suggestions for some fundamental things to learn, and learn to a point that it fundamentally shifts the way you look at the world.

A Pure Object Oriented Langauge - I would suggest smalltalk for purity, but ruby makes a fine substitute and has more application.

A Functional Language - For this look no further than Haskell, and Learn you a Haskell for Great Good

A Set Language - Take the time to learn SQL, its a tool most of us use everyday and very few of us take the time to learn properly

A Markup Language - HTML + CSS, these are the embodiment of separating data and display, learn them intimately.

Learning these, but more importantly grokking them will color the way you approach problems and solutions. Having a depth of knowledge in a key variety of tools allows you to quickly master new ones, better understand problem domains, and more completely express your thoughts in the abstract.

20091228

modeling

[caption id="attachment_479" align="alignright" width="225" caption="Just like this, only less beautiful women and more boring data"]

[/caption]

When solving a problem the most difficult part should almost never be the implementation. Implementing a solution should be fairly straightforward from the solution itself, if it is not, then your solution is incomplete. That doesn't mean that you need to sit down with pen and paper and completely solve a problem before coding, this is an approach taken by some books attempting to teach software development. Some go as far to advocate for an even stricter approach, something along these lines.

Model the problem domain with UML or some other modeling technique

Translate your modeling into pseudo-code

Translate pseudo-code into actual code

Realize 2 month project took 2 years to complete

Go to step 1 (the problem has radically changed since you took so long)

Of course, I'm poking fun at the books that take such a structured route. Here is a news flash for everyone Programmers are lazy. Here is another news flash (and this might actually be news to some people) Being lazy is good. Not all laziness is good, if it causes you to cut corners and fail to deliver a high quality product, than you fail at life. If however your laziness drives you to find an easier way to do things then "Lazy FTW!"

I have the "joy" of programming Java in my 9 to 5, when you write Java for a living you get used to the absurd amount of verbosity in your code, you also have the joy of Eclipse (notice no quotes around joy) that will write most of your Java code for you. Then there are things like Project Lombok that strive to make Java easier and more fun to write. C# got the memo, and let their programmers be lazy, let's take a look at attributes in a few languages

Here is some C# code to make a class called Foo with a string member called Bar that can be read and written
[csharp]
public class Foo {
public string Bar { get; set; }
}
[/csharp]

They realized people are lazy, let's look at the same thing in ruby

[ruby]
class Foo
attr_accessor :bar
end
[/ruby]

Again, concise and simple, let's look at some Java

[java]
public class Foo {
private string bar;

public string getBar() {
return this.bar;
}

public void setBar(string bar) {
this.bar = bar;
}
}
[/java]

Can you hear it? I can, it's the language architect saying "Just don't be so lazy, write those simple functions, it isn't hard." The truth of the matter is that it isn't hard, in fact Eclipse will happily write these functions for you, just hit CTRL-SHIFT-G. I'm sure there is a point here that I'm getting to, and here it is. I don't want to disparage Java or start a language war, what I want to point out is that Java has a different conceptual model than C# or ruby. C# has the concept of properties and ruby has the concept of attributes, Java doesn't have anything like this, classes have members and functions, nothing more.

The point is that the conceptual models your solution adopts will have huge ramifications on the implementation of the solution. There are many examples of unifying philosophies that make things simpler. Unix's concept that everything is a file, C++ concept of streams, Erlang's pervasive share-nothing message passing, ruby's duck typing. These are core concepts that have profound and far reaching consequences.

Working on a new side project I recently was struck by a way to simplify a complex data structure by adopting a new conceptual model. Creating an internally consistent, elegant conceptual model is the most important thing you can do for a project's success.

Conceptual modeling is a hard thing to get right, go into too much detail and your model becomes rigid and brittle, go into too little and the model can be impossible to implement correctly. Making a good conceptual model is like making a good analogy, there are a few hallmarks to a good conceptual model.

The conceptual model should simplify the problem domain. Files and folders greatly simplifies the concept of a hierarchical file system by relating it to something commonplace.

The conceptual model should not be overly complex to translate into an implementation

The conceptual model should be uniform and consistent, if you find parts of the problem domain falling outside the conceptual model or feeling "tacked on" you should rework the model

The next time a project seems stuck or overly complex, instead of refactoring code or attempting to throw another library at it, take a step back, look at the big picture and refactor your conceptual mode.

20091215

plain stupid

[caption id="attachment_409" align="alignright" width="136" caption="typical dumbass"]

[/caption]

Are you a dumbass? It's an important question to ask yourself every once and a while, especially if you want people to put their trust in you. This has been brought up because of a service called RockYou that recently exposed 32,603,388 plaintext passwords. Then it got even worse when it was confirmed that RockYou was also storing 3rd party passwords, again in plaintext.

Passwords are a tricky thing to store, RockYou clearly never got the memo. I've been at this whole "writing software thing" for a while and I've gone through my own personal journey of discovery about vulnerabilities and best practices. Let's walk through that now

Plaintext - This is the first stage, the one that RockYou never left apparently. It is the simplest to implement, take the password, shove it in the DB. Need to check against the password, if(input == password) and done! If you don't see the problem with storing your passwords plaintext, go to the nearest hard surface and slam your head into it over and over again, you no longer need cognitive functions.

MD5 Hashes - This is generally the second stage of password discovery. Many languages have a nice built-in function, php does. Store that MD5 hash and you get that warm fuzzy feeling in your tummy that you aren't storing things in plaintext. The problem now is that MD5 has well known weaknesses that can be exploited. Rainbow tables provide an attack vector and reverse look-up tools opens up a ton of exposure.

Salty passwords.... delicious - Your users are probably going to pick awful passwords, even if you have a password policy its probably going to end up "password1" You can't rely on them to provide strong passwords, so you can do the next best thing, make it strong for them. Static salting is the act of adding predictable data to a password before hashing. Maybe you append the string "Matt~is~the~best~programmer~7829" before hashing the string. This will defeat most reverse lookup tools (as very few people have bothered hashing "Matt~is~the~best~programmer~7829password1" and putting it into a database of reverse lookups) The problem is you are still open to dictionary and rainbow table attacks. If your database is compromised there is a chance that your source could be compromised as well, and then that string sitting in some configuration file is the key that unlocks everyone's password.

Dynamic Salt... deliciously nerdy - Instead of having a static string you can create some dynamic salt. When a user want to store a password, generate some dynamic salt, combine it with the password, hash, mangle, and store. Now every hash is a puzzle, security in layers.

Non-trivial calculation times - The problem with MD5 is that its too quick. Use a more complex hashing algorithm like SHA-1 or SHA-256, perform data mangling and multiple encryption sweeps so that turning a plaintext password into a hash you store takes some non-trivial amount of computation time. What's the point of this? It may add 1 second to saving and comparing hashes for you which is normally negligible in terms of user experience. The advantage comes from defending against a brute-force attack where the attacker has to generate millions of hashes. Will Bond does an excellent job outlining this in the rational of the fCryptography class in Flourish.

Timeouts - The normal user will have to enter a password once maybe twice if they fat-finger it. Using a quadratic-growth timeout system can help prevent brute-force attacks, or at least slow them way down. A brute-force attack relies on the ability to quickly cycle through thousands if not millions of passwords in an attempt to find the correct one. Let's take a look at how quadratic-growth timeouts defeat this system. Each attempt has a timeout twice as much as the timeout that preceded it, we will start with a timeout value of 1 second.
- 1st Brute-Force attempt fails server let's the person retry, but they must wait 1 second.
- 2nd attempt fails, retry in 2 seconds.
- 3rd attempt fails, retry in 4 seconds.
- 4th attempt fails, retry in 8 seconds.
- 5th attempt fails, retry in 16 seconds.
- ...
- 10th attempt fails, retry in 512 seconds. (8 and a halfish minutes).
- ...
- 20th attempt fails, retry in 524288 seconds (8738 minutes or 145 hours or 6 days)
Most users will never experience more that 3 seconds of timeout, a cracker would have to wait 35,702,051 years to attempt 50 passwords.

Storing passwords correctly and defending against brute-force attacks is non-trivial. It's one of those things that people a lot smarter than you or I have spent a lot of time thinking about. In a former life I was a mathematician (sometimes I fancy that I still am), I have had the pleasure of picking up a book on formal cryptography, getting about 2 chapters in, and setting it down with a headache. There are a lot of great libraries out there to do this stuff for you if you are too lazy to take the time to do it right yourself.

For php there is Will Bond's fCryptography, for perl there is Crypt (I'm not a perl person so anyone is more than welcome to provide a better alternative), for any language there is normally a high quality cryptography library. If you are not using them, then you are simply being lazy.

Of course you could side-step the whole issue and let someone else do your dirty work. Its all a symptom of a problem that Mozilla Labs is trying to solve, the concept of identity on the web. I think the future will be the browser being the trusted agent instead of third-party websites, but until then we have to be responsible software developers.

20091125

refactor

[caption id="attachment_292" align="alignleft" width="225" caption="google thinks this means refactor"]

[/caption]

I've been working a lot on prosper lately (this means that if you want to play around with it, get the bleeding edge on GitHub). I recently added a lazy loading feature, mostly to facilitate unit testing, and pretty soon I will be adding some unit testing. I was able to take care of some things recently that have bugged me for a while, and its because of ruthless refactoring.

Refactoring is not black magic, it just means taking some code and reworking it, rename a function here, encapsulate some functionality there, done. The sticky part is that refactoring is a compounding problem. Things that should be refactored build up on each other, so if you put off refactoring for too long you will have a real mess on your hands. Refactoring can be painful but should never be put off, you see a function named poorly but know that it is used in 20 places, get it now, don't put it off, because by the time you get around to renaming it, it will be in more places.

A good IDE will help you in your refactoring process, the one that I love is (surprise, surprise) Eclipse. Eclipse is brilliant at refactoring, probably because it keeps semantic knowledge with symbol names. Rename the function foo to bar and you don't have to worry about Eclipse destorying all those variables named foo (ps. if you have variables named foo you have refactoring to do!). Eclipse (and other IDEs) are great at all kinds of refactoring, renaming, extracting and encapsulating, pull up, push down, etc. Get to know these tools, get comfortable with using them, and refactor without fear.

Wait, change big chunks of code without fear, how!? Well you should have tests, be they unit or functional. I'm introducing unit tests into prosper, but for the time being I have a local version of todo list case study. The todo list application is pretty good because it exercises all the important bits, if I change something and everything in there keeps working, I haven't broken anything too badly.

The reason I want to introduce unit tests though is that I've introduced regressions even with the todo list humming along, some dark corner of prosper has started shooting out unicorns, but I haven't noticed, until I see some bit of code that I know can't be doing the right thing. Test coverage allows you to refactor without fear, and if you want to have a living project that people might someday use, you need to refactor without fear to keep things current.

That's it for the technical part of this blog post, I would like to announce a few things though.

The move to GitHub has been completed and I'm starting to get into the workflow of using it now.

The new skunkworks project is nearing a beta release, so keep an eye out.

Development on the new project has spurred improvements in prosper, there will be a point release (0.6) soon

There probably won't be a post tomorrow as I'm traveling with Heather, my beautiful girlfriend, up to see my family in Cleveland.

That's all for now, I will have something up Black Friday more than likely, until then, Happy Turkey Day everyone, safe travels, see you after I put another couple of pounds on.

20091112

teaching

[caption id="attachment_113" align="alignright" width="226" caption="typed \'classroom discipline\' into google"] typed 'classroom discipline' into google

[/caption]

For those of you unaware of my background it may surprise you to find out that I have two degrees, one is in Computer Science, the other is in Theoretical Mathematics, the kind you can't use for anything ;). During my day to day, 9 to 5, actually getting paid to do stuff job, I work all day using the Computer Science part of my brain, programming various things. This has left the Math part of my brain atrophied and depressed, I barely visit, hardly ever call, and never send it flowers anymore. But, for this post I will be dusting off the Math part of my brain, don't worry nothing too in depth. For those that want to skip to the point click here

My senior year I took a 400 level course in the History of Mathematics, it was probably my favorite and most interesting math course. We discussed the evolution of number systems and mathematical theory from the Greek Ionian Numerals to Modern Abstract Algebra, it was a fantastic journey through human knowledge. I will always remember the lesson on Euclidean Geometry and most importantly why it was taught.

Euclidean Geometry has 5 basic principles (called axioms) upon which all other theorems, corollaries, and other fancy math talk are built off of. The important thing about Euclidean Geometry is not the subject that it examines, moreover it is the way in which it examines it. When Euclid penned Elements it was the first major examination based off of a logical system and comprehensive deduction.

The whole idea was that within this beautiful system, no matter how big and magnificent the structure became, you could always walk a theorem through the structure and down to the axioms without making any leaps of logic. This used to be the point of teaching Euclidean Geometry, it is a concrete example of a deductive and logical system, working on objects simple enough that children can understand it.

Sadly, we no longer teach the most important part of Euclidean Geometry to our children in America. The important part, deductive reasoning, is very difficult to teach, it requires magnitudes more work than what we currently teach, mechanical geometry. We have children memorize rules and tricks and mnemonic devices so that they can perform mechanical operations on lines and angles and shapes. They learn nothing from this, they will forget these machinations, and they will have never learned the deductive reasoning skills that seem so lacking these days.

What am I writing about though, why does this belong on a programming blog, have I gone mad? To answer these question in order: This is an interesting story you just got for free (ingrate) and will dovetail nicely with the next section, its reason for existing will become apparent in a second, and you can't go somewhere you already are. The point of the above is that in teaching a concrete something (in the above case Geometry) you might actually be trying to teach an abstract something else (in the above case Deductive Reasoning and Logic).

As a Mathematician I think that we should go back to teaching Euclidean Geometry the right way, but as a Computer Scientist I like flashy fun interactive keyboardy things, so let's talk about those. You see there is another way to trick kids into learning about structured logic and deductive reasoning, programming. We can if we try a bit even make it fun. Let's take a look at past and present attempts to do just that.

Logo

I remember you fondly from my childhood, Mr. Turtle. Logo was a simple language built at MIT on top of LISP, possibly the seed of my love for LISP was planted by this simple turtle. The idea was to give the child a simple interface with a little turtle that they could command around using a simple syntax.
[caption id="attachment_105" align="alignnone" width="488" caption="mit logo guide"][/caption]
[sql]
FORWARD 50
RIGHT 90
FORWARD 50
RIGHT 90
FORWARD 50
RIGHT 90
FORWARD 50
RIGHT 90
[/sql]
That code would draw a nice simple square, 50 pixels by 50 pixels, but logo didn't stop there, it had functions and loops and all many of list processing. The above code could be simplified to
[sql]
REPEAT 4 [ FORWARD 50 LEFT 90 ]
[/sql]
and suddenly the child would learn loops, it was great fun and is still used today to teach programming to young children. There are still active versions, like FSMLogo for Windows or ACSLogo for Mac

Alice

Alice is a new attempt to make programming easy and fun, it allows you to start making 3D scenes almost immediately. It is free from Carnegie Mellon and they do a pretty good job summing up what it is.

Alice is an innovative 3D programming environment that makes it easy to create an animation for telling a story, playing an interactive game, or a video to share on the web. Alice is a freely available teaching tool designed to be a student's first exposure to object-oriented programming. It allows students to learn fundamental programming concepts in the context of creating animated movies and simple video games. In Alice, 3-D objects (e.g., people, animals, and vehicles) populate a virtual world and students create a program to animate the objects.

Here is a video introducing Alice

Some of the innovative features are the fully 3D environment, the drag-and-drop source editing in which code constructs are represented by physical objects, and the integration with Java and the JVM.

Scratch

Another effort from MIT. Scratch's motto is "imagine, program, share" and they live up to it. Like Alice they utilize a drag-and-drop source editing system, but there is a much greater emphasis on the social aspect of it all. In a web 2.0 nod, they encourage people to upload their projects and let other mix and match projects in fun and interesting ways. Its easy for kids to jump right in because there are just under 600,000 projects being shared on the Scratch website, so there is plenty of examples. Here is the Scratch video.

Scratch from Karen Brennan on Vimeo.

Hackety Hack

[caption id="attachment_112" align="alignright" width="150" caption="HacketyHack Logo"]

[/caption]

I didn't know whether or not to include this one, the creator _why decided to leave the internet behind a while ago and as such abandoned this project. It has been picked up and rescued by Steve Klabnik. It is based off of ruby and allows kids to get interesting programs up and running quickly because of the powerful and intuitive libraries included. It is an integrated IDE with tutorial, editor, and runtime all rolled into one. The best part of HacketyHack is that so much of what you learn can be transferred directly to the mature ruby language. You can start your kids off with HacketyHack and when they become jaded nerdy teens that can start their own website in RoR to talk about Twilight or whatever the hell it is teenagers do.

Lego Mindstorms

This last one comes from Lego and costs some cash unlike the previous offerings, but the money gets you a badass programmable robot. It has a drag-and-drop development environment similar to Alice and Scratch. Kids (and adults) can get really into it, making lots of cool things and holding sweet lego competitions. It's one thing to see "Hello World" on a monitor, its an entirely different feeling to have a robot do your bidding.

So that's the educational round-up, with these tools we can find something to appeal to the youth today and hopefully teach them some sweet, sweet logic. Give it a try with your kids, or try to start a program up at a local school or social club. Spreading the joy of programming will enrich their lives and your own.

20091110

convention vs convenience

[caption id="attachment_84" align="alignright" width="300" caption="convenience, american style"]

[/caption]

I'm a huge believer in Convention over Configuration because it makes life easier and makes me more productive. I use to program in Java, and unless there is some seismic shake-up, I will soon be going back to Java. I like Java well enough, it's no lisp or ruby, but it has its place in the business world. I have many gripes with Java, verbosity, complexity, etc, but when you are in the enterprise trying to work with a bunch of third party pieces cobbled together into a hulking software nightmare the worst, by far, is configuration.

It makes some business sense, if a customer won't use your product because they want to change the tooltip on the help page and they can't without hiring a Java programmer, or at all because you distribute closed source .jar files, the simplest solution is to toss a config file at them. Just change this or that setting in the config file and, look at that, the whole application is christmas colored and in sanskrit. There are a some huge problems with Java configuration though.

The unfortunate pairing with XML - XML hit its high water mark around the same time as Java, maybe there was some reciprocal love there, and it can be maddening.
[xml]
<env-entry>
<env-entry-name>maxExemptions</param-name>
<env-entry-value>10</env-entry-value>
<env-entry-type>java.lang.Integer</env-entry-type>
</env-entry>
[/xml]
Oh fuck me, are you serious?! I took that from the official Tomcat Documentation. It just sets maxExemptions = 10, but it takes 5 lines, 2 layers of nesting, 4 open tags, 4 close tags, and as you can see, this is a straight copy and paste from the official documentation and it has a pretty clear error. Clear to me because my eyes have been trained to read xml like a champion, the env-entry-name tag is closed by a param-name tag, that isn't right.

Undocumented DSL - Every configuration is basically an undocumented Domain Specific Language wrapped up in XML's ugly ass clothing. There is little transfer of knowledge between Java and a Java Configuration file, or even between XML and a Java Configuration file.

Undiscoverable - Maybe it would be more accurate to put hard to discover. Where do you go to figure out what belongs in your config, or what nodes configure what, your best bet is to hope the developer wrote up (and subsequently kept up to date) some documentation. Little to no help from your favorite IDE's code completion and the constrained nature of the problem domain makes web searches less likely to yield helpful information.

Twiddling - Like in field of dreams, if you make a configuration, they will come. Sure there is no good reason for X to be configurable, but I don't like having hardcoded values in my code, so every hardcoded value is now configurable, and I'm on the slippery slope of softcoding now. This allows the end user too much power to twiddle around and configure things that don't ever need to be played with, just because we can doesn't mean we should.

This isn't just Java's problem, they are just the easiest to pick on because it seems like its everywhere. Then Ruby on Rails came onto the scene and made popular this idea, convention over configuration. I want to put my models in the fliggity directory, that's too bad, they go in the model directory. I would like to name my table that stores user data 'tc_people_datastore', yea well I would like a billion dollars, you are going to call it users. This means that if tomorrow I'm told to go work on a RoR project, having never seen it, I would have a good idea how the project is laid out, where the data lives, and how everything is hooked together. This convention eases the mental load I have to carry around, replacing it with simple, sane rules.

Convention over configuration, especially the rails way, has been called opinionated software. The software, in this case rails, has an opinion about how things should be laid out, what your tables should be called, etc. I'm in the midst of writing my own software and API and I've decided that my software should have an opinion about stuff, but more importantly that things should follow certain conventions.

[caption id="attachment_86" align="alignleft" width="205" caption="conventional breakfast"]

[/caption]

As a corollary to the conventions is a strive for consistency. Maybe a better title for this post would have been consistency vs convenience, but I'm already 700 words in, no going back now. I'd like a consistent API for a few reasons. Consistency makes it easy to remember, does this function go ($needle, $haystack) or ($haystack, $needle)? They all go ($needle, $haystack) calm down. Consistency makes wrong code look wrong, after seeing the same type of thing over and over again, the pattern gets burned in your brain, any deviation is obvious. I'm a little OCD, and making things consistent feels better.

The problem is that convention taken too far leads to an ugly little town called boilerplate, and no one wants to live there. This is where convenience comes in, it allows you to more-or-less follow convention but allow yourself an out to skip over the obvious parts. The problem is trying to strike an appropriate balance. I have a story of failure and redemption that I will quickly share with you.

My new side project is written in phpand makes use of the ubiquitous associative array when it makes sense to. I love me some php, but I hate the associative array literal syntax, and it's not going to change anytime soon For those of you unfamiliar here is how you would make the same associative arrays, in json and php.

[javascript]
var example = {'a': '1', 'b': '2', 'c': '3'};
[/javascript]
[php]
$example = array('a' => 1, 'b' => 2, 'c' => 3);
[/php]

Doesn't look too bad or different, but if you have to have nested arrays or use an array in a function signature (my case) it gets a ugly pretty quickly. I have also been reading up on lisp a lot recently and they have an idea that successive arguments can act as a pair. I thought this was a pretty nifty idea, so I set about creating an alternative calling convention.

[php]
$foo->bar(array('name' => 'Matt', 'age' => 23));
[/php]

Would be identical to

[php]
$foo->bar('name', 'Matt', 'age', 23);
[/php]

I thought that this looked much nicer, and it is fairly trivial to implement

[php]
class Foo {
function bar($values) {
if(func_num_args() > 1) {
$values = self::associate(func_get_args());
}
...
}

static function associate($args) {
$count = count($args);
if($count % 2 == 1) {
$args[] = null;
++$count;
}
for($i = 0; $i < $count; %i += 2) {
$result[$args[$i]] = $args[$i + 1];
}
return $result;
}
}
[/php]

I had done it, bam, lisp style associative arguments in php. The problem though is that, well, wtf? That is going to be the reaction to any php programmer unfamiliar with the lisp convention. I failed to follow the conventions of the language, so this morning I tore this code out. It added a secondary way to call a function, and it also introduces several edge cases, the benefit is also dubious. I was scratching my language implementer itch, but not in an appropriate fashion. This time convenience had to be sacrificed for convention's sake.

20091109

api design

[caption id="attachment_76" align="alignright" width="197" caption="blinkenlight interface"]

[/caption]

I'm still working on my skunkworks side project, over the weekend I had the joy of integrating several third party php libraries. I got to spend a good amount of time on php.net reading over APIs and figuring out how to fit them into my project. Some of them were sublime, as though the author had read my mind and knew my exact mental model. Some of them were abominations, fighting me all the way. This got me to thinking about the design of a good API

What makes an API good? There are a few things that make an API really nice to work with.

Similarity of behavior - Writing an API that does searching through a b-tree? Look at how searching is implemented for arrays or strings, and then copy the crap out of that API. This allows the developer to use all that knowledge they've built up about searching, so if I know [php]array_first($needle, $haystack)[/php] returns the first instance of $needle in $haystack or FALSE on failure to find $needle, then [php]btree_first($needle, $haystack)[/php] should work the same way.

Readability - Your API should make code that is readable, the function names should be descriptive (without being overly verbose), and code written with it should flow nicely. Avoid using difficult to pronounce function names like strcspn.

Minimalism - You're writing an API because you are doing something non-trivial, something complex enough that you want a simple looking API to interact with, so do exactly that, make it simple. I'm sure that reflangulating the zyffer is a complex process that involves juxtaposing the allibaster and repeppering the kilgore while making sure not to narfle the garthok, but the reason you are writing an API is to hide that complexity away, don't write a Leaky Abstraction. Allow me to write code like this
[php]
$zyffer = new Zyffer();
$zyffer->reflangulate();
[/php]
Not like this
[php]
$zyffer = new Zyffer();
$zyffer->juxtapose(Zyffer::allibaster);
$zyffer->denarfle(Zyffer::garthok);
$zyffer->repepper(Zyffer::kilgore);
$zyffer->reflangulate();
[/php]

In the course of writing the API for my side project I've found it useful to put myself in the shoes of a new programmer trying to use my API. How long would it take them to figure out X? How often would they curse my name? What is the WTFs/min ratio looking like? It has been a helpful tool to adopt that mindset and ask myself, why am I requiring this parameter, why do they have to call this function before that function, and would that be apparent, why am I making their life so difficult. It has helped my slim down my API considerably, this combined with Convention over Configuration has led to an API approaching "not terrible."

Then it hit me, I have stumbled upon a big important rule, that I've implicitly been following for years, but now my brain is aware of it.

You should write everything like it will one day be a public API

Of course, like any rule that is written in absolutes, there are sure to be exceptions. But I think on the whole, it will serve you well for a few reasons. Public APIs are written to be simple to work with, which means that after you've encapsulated all the hard complicated stuff you can interact with a nice clean API. This will make your life nice when working with your API, but the best part is maintenance. Remember that new programmer we imagined to help write the API, that will be you, intrepid API writer, just a few short months after moving on from the API. You will inevitably be called back to add a feature or fix a bug, and if your API is simple to pick up, then you will remember more of it, and relearn the parts you forgot that much quicker.

Another advantage is that a well-written API is much more likely to encapsulate well-written code. When your API is separated nicely and parsed up cleanly between well defined units of work, the code powering them will probably have well defined separation of labor and understandable flow. The API itself becomes the documentation for how a process is accomplished, the various actors nicely laid out as well defined classes and the behaviors as fancy-pants interfaces. I'm certain you can create a clean API over a horrible pile of spaghetti code, just as surely as you can create a crap API over a beautiful collection of clean OOP, but a good API encourages good code.

At then end of the day the API is the face of your code. You can write up all the pretty documentation and how-to's and promotional websites with sweet web 2.0 reflection and fun oversized graphics, but when it gets down to brass tacks the developer is going to be instantiating your objects and calling your functions. The fact that there is a pretty floating cloud icon isn't going to make a developer feel any better that she just spent 4 hours figuring out that she forgot to juxtapose the allibaster and that made shit hit the fan when she called the promulgate function on the zyffer's subclown. Make sure that your code has a pretty face, even if its only you using it, the benefits will far outweigh the minor upfront costs

20091105

duck typing works

[caption id="attachment_62" align="alignleft" width="226" caption="Design Patterns in Ruby"]

[/caption]

I recently had the joy of picking up the book Design Patterns in Ruby which is a great book, made my Christmas wish list. If you have been living on the moon or are just one of those weird people that haven't heard of ruby yet, it is a dynamically typed interpreted language made famous by the web framework, Ruby on Rails.

Do you see what I just did right there? I gave you some background information, information that you more than likely don't need, you are reading my blog, you know what ruby is. Russ Olsen (author of Design Patterns in Ruby) does the same thing with duck typing. What is duck typing you ask? (again you probably didn't, you know what it is, or you are familiar with this thing called Google). Well it comes from the term, "If it walks like a duck, quacks like a duck, then its a duck" and according to the authority on all human knowledge it can be defined as the following

In computer programming with object-oriented programming languages, duck typing is a style of dynamic typing in which an object's current set of methods and properties determines the valid semantics, rather than its inheritance from a particular class or implementation of a specific interface.

One of the properties this gets you is the ability to pass any kind of variable anywhere and as long as it acts like the object the function is expecting it will work, no matter what class it is. In practice this allows for some great code reuse and as described in the book the evaporation of some class GoF design patterns. Olsen goes into a bit of depth about duck typing and the classic fear of someone coming from a strongly typed language.

So there is no compile time or static checks, what stops you from passing a database connection into a function that accepts an int or calling the give_big_fat_bonus method on a lowly worker instance instead of an executive instance, or of course the ceo instance which is a derived class of executive.
Anonymous Object Oriented Programmer

The answer is a resounding, nothing! (Dovetails nicely with my last post) Nothing stops you from doing this, go ahead see what happens. Depends on the language, in ruby you get a NoMethodError, and you get something like that in most duck typed languages.

The question then is how do people write any code that works, the answer is by not being full-on stupid. Ruby, and other duck typed languages, have taken to heart the sentiment of my post i can't stop you from being stupid (The shameless self promotion is getting a little rampant at this point). The language can't stop you from treating an int like a database connection, and really why should it. What in the hell are you doing mixing up an int and a database connection, you deserve to get punched in the face, a NoMethodError is the least of your worries.

But, accidents happen, so your db variable holding the database connection and your dv variable holding the direct value modifier amount are dangerously close on the keyboard. Are you going to just pray that you are 100% correct in your typing all the time, who do you think you are, me?! (I'll wait patiently as grammar-nazis and typoistas tear apart this post). This is why unit testing is so huge in duck typed languages, if you are using sweet delicious lisp then repl is your friend, ruby also has repl (irb) but its not nearly as integral to the development process as repl is to lisp. Your tests let you know that you aren't doing something stupid, and performs all the static and dynamic checking one could want.

So go out and give a duck typed language a shot, pretty soon you will enjoy the added speed, and you will realize that the static syntax checking is not necessary if you know what the hell you are doing.

20091104

the power of nothing

Sometimes you get a bright idea, or you are given a difficult task, and you want to start working on it. It's intoxicatingly easy to dream up feature upon feature for a new project, or build a giant architecture to solve a problem, all of these things look powerful. The other problem that we run into is that with software anything is possible with enough code, and the people that write code like to work on interesting complex problems. This leads to conversations like this.

First Dev: I want this new project to seamlessly integrate with eBay, digg, and amazon

Second Dev: Those seem arbitrary, maybe we should define some sort of data import protocol

First Dev: Yea, then we just need a scraper piece that uses the adapter pattern to translate various websites into the standard import protocol and feed the project

Second Dev: I'll start defining a dtd and you can start writing up some xslt for eBay and digg and amazon as a proof-of-concept

First Dev: Great, then we just need to get the exchange support and the poor man's cron jobs running, to send out the periodic updates

Second Dev: Should we integrate with Twitter?

First Dev: I don't see why not

Third Dev (interrupting): Wait, what are we developing?

First Dev: Todo list

Third Dev: Maybe we should get it to store and delete todo list items first

Second Dev: That's the easy part, the multitouch integration is going to be the hard part

Third Dev: ...

[caption id="attachment_57" align="alignright" width="300" caption="the dream"] nothing todo list

[/caption]

Before we get the product to do something simple, something any moron could code up, we want to start slapping on features and bells and whistles and whizbangs. This is all fun, because whizbangs are fun things to think about and code and its all well and good until you realize you have coded up a great feed reader, but one hell of a lousy todo list. Focusing on least viable product in the beginning gives you a concrete base to start applying whizbangs, based off of real world feedback and grounded in reality. Incrementally building up value off of an existing product is easier and makes more sense than building a giant all-in-one application that can do everything but butter your toast that no one wants to use.

So this leads me to the best way forward in my estimation and experience, nothing. Nothing is incredibly powerful, tremendously liberating, and allows you to get something up and running in least viability mode or to solve a complex issue.

I can hear it already, what the hell are you talking about you mentally unstable person?! Let's look at some tools that exemplify nothing and how you can go from nothing to something quickly and incrementally, but only if you start with nothing.

Mock-ups. What is a mock-up, nothing really, just a fancy picture of what you want an interface to look like. There are many great tools for making mock-ups, a new one that just hit beta is mockingbird, a more seasoned veteran in this game is the excellent Balsamiq. These shouldn't help, we all know what a todo-list looks like, why would mocking one up help us? Because, even though we are capable of abstract thinking it is much easier if we have a concrete model to wrap our heads around. Like I said in tool roundup

[talking about a sweet visualization I made] ...but it is a much better reference when working with this particular part of the system than looking at the database table and trying to assemble parts of it in my brain as the situation demands.

If we don't have a concrete model we force our brains to construct one on the fly, taking us out of our work and into a conceptualization, having a concrete model frees up precious processor cycles in our brain. A mock-up is also great because it allows us to take nothing, a simple functionless mockup, and quickly turn it into an actual interface. What does that button do? Nothing! What about that one? Nothing!!! Then we can slowly but surely add functionality to our interface until we have a functional least viable product. The way we got there was by understanding that we start with nothing and work our way gradually to something. Sure we could have sat down at the keyboard and pounded away code for a day or two and then run the thing, but the faster we can get it on the screen and start iterating over it, the better.

The second way that nothing can be useful is when tackling a huge problem, lets look at the contrived feed reader problem from up above. How in the world do we go about solving that?! Well we employ are good friend nothing, and we build from there. What's would the simplest possible solution be?

[php]
$feed = FeedReader::read("http://www.digg.com");
[/php]

We can throw that in our program, now we need to deal with the FeedReader class

[php]
class FeedReader {
static function read($url) {
return '';
}
}
[/php]

Nothing to the rescue, we just defined our super simple API, but we aren't quite to functional yet, but we have a foothold now to iterate from.

[php]
class FeedReader {
static $readers = array('http://www.ebay.com' => new EbayReader(),
'http://www.digg.com' => new DiggReader(),
'http://www.amazon.com' => new AmazonReader));

static function read($url) {
$reader = self::get_reader($url);
if($reader) {
return $reader->process();
}
}

static function get_reader($url) {
return self::$readers[$url];
}
}
[/php]

Wow look we are almost all the way to functionality, amazing!!! Thank you nothing (that's what EbayReader, DiggReader, and AmazonReader do). Now its the simple task of implementing those, which each by themselves is a somewhat complicated task and outside the scope of this blog post.

This sort of top down design though can be very helpful, you need to solve a problem, just pretend that you have an API that already solves it and start calling it, stub out those functions (stub is a fancy computer science term for "make it do nothing"), and watch the power of nothing first hand. Then you fill in the blanks, allowing yourself to have more nothing functions, you successfully break up your huge monster task into simple solvable tasks.

When we combine the first technique with the second we can get truly awesome results, we normally call this Rapid Application Development (or the much cooler 1980's name, RAD). RAD can get us to something faster than almost anything, using the power of nothing.

20101022

20100831

20100331

20100211

20100209

20100201

20100129

The Good

The Bad

Alternatives

Conclusion

20100126

20100125

20100111

20091228

20091215

20091125

20091112

20091110

20091109

20091105

20091104

search

archives

blogroll

me