PolyGlot, Inc. News

Wednesday, June 9, 2010

Fuzzy Unit Testing, Performance Unit Testing

In reading Philosophy 101, about Truth with a capital "T", and the non-traditional logics that use new notions of truth, we of course arrive at Fuzzy Logic with its departure from simple binary true/false values, and embrace of an arbitrarily wide range of values in between.

Contemplating this gave me a small AHA moment: Unit Testing is an area where there is an implicit assumption that "Test Passes" has either a true or false value. How about Fuzzy Unit Testing where there is some numeric value in the 0...1 range which reports a degree of pass/fail-ness? i.e. a percentage pass/fail for each test. For example, testing algorithms that predict something could be given a percentage pass/fail based on how well the prediction matched the actual value. Stock market predictions, bank customer credit default prediction, etc come to mind. This sort of testing of predictions about future defaults (i.e. credit grades) is just the sort of thing that the BASEL II accords are forcing banks to start doing.

Another great idea (if I do say so myself) that I had a few years ago was the notion that there is extra meta-data that could/should be gathered as a part of running unit test suites; specifically, the performance characteristics of each test run. The fact that a test still passes, but is 10 times slower than the previous test run, is a very important piece of information that we don't usually get. Archiving and reporting on this meta-data about each test run can give very interesting metrics on how the code changes are improving/degrading performance on various application features/behavior over time. I can now see that this comparative performance data would be a form of fuzzy testing.

Thursday, April 8, 2010

Too cool for school (or, history anyway)

There has been a flurry of articles and blog posts in reaction to Oracle's Jeet Kaul who, at EclipseCon 2010, said "We need to get the younger generation interested and excited [about Java] just like I was" and "I would like to see people with piercings doing Java programming". There have been some exchanges over "cool" languages and whether "cool" is a good thing. I contributed the following 2-cents on a JavaWorld discussion...

"Cool" led to most computer security problems in the world today

I have been around long enough to have seen several generations of "cool" languages overtaking established ones. Unfortunately, the new language was sometimes a big step backward.

A case in point was when C swept the world, replacing the line of languages like Pascal and the stillborn Ada. Because Unix was starting to get out of the lab and into microcomputers, C gained visibility.

BIG PROBLEM: Pascal (and others) could tell how big an array was, and hence could stop execution when someone attempted to write past the end. C was unable to do this (and because of the language design there was no way to "fix it" in the compiler). This is the security hole that is at the root of all computer virus exploits.

If developers chose new languages (or systems) based on technical merit rather than "cool", we would be decades ahead of where we are today.

As a side note/rant: I have always thought that "cool" was effected by political zeitgeist. It seems more than coincidence that European/Govt-regulated/strong-typing languages (e.g. Pascal/Ada) were replaced by get-the-compiler-regulation-off-the-programmers-back ethic of C at the same time Ronald Reagan was selling everyone on the same notion about government. Let the programmer do dangerous things (like convert an array into a pointer) because he knows better...(sounds like let the bankers do whatever they want because they know better than the regulators who after all only learn from past mistakes...wimps!)

Tuesday, March 30, 2010

Moore's Paradox. I'm just saying!

The popular phrase "I'm just saying" has been around long enough for most people to have heard it, but not long enough for it to be well-documented as to where it originated. I heard a great stand up comic bit about it in the 1980's by Paul Reiser. There are several blog sites that muse over its origin and solicit theories:

It turns out that the most common definition of the phrase exhibits a logical paradox from Philosophy. The book "this sentence is false" is a collection of philosophical paradoxes, and it describes Moore's Paradox (as developed by G.E. Moore). I summarize it as follows:

Normally, everything that can be said about the world can be said by anyone. I can say the moon is made of green cheese, and you can say it. The state of the world described by me can equally be described by you with no logical paradox...EXCEPT... I can say that the moon is made of green cheese, and I can say that you do not believe that the moon is made of green cheese, but YOU can not say the same thing. I.E. you can not say that X is true and at the same time say that you do not believe that X is true. Note that you are not saying that you could be wrong in your belief, you are be saying that you believe both that X is, and is not, true at the same time. A logical contradiction.

However, whenever you use the phrase "I'm just saying!", you are in effect performing Moore's paradox.

Thursday, December 18, 2008

Where Am I?

In I am a Strange Loop[1], Doug Hofstadter ponders where one's "self" is located while being mentally absorbed by a situation that is located in a different place than one's body is currently residing. A simple example being that of reading Jane Austin while sitting in a chair. Another example being remote-controlling a robot on the moon. He asks the question "Where Am *I*" (where "I" is his shorthand for soul/self/consciousness).

It reminded me of my very first days exploring the World Wide Web in 1994. I explained to my family, as I gave them a guided tour of my new toy "Netscape", that we could "visit" places all around the world! Look, here we go to the South Pole[2] or Australia[3]! Because in those early days, the web server and the content were actually physically in those places, and because the browser was hardly more than a remote terminal program, it really was like remote logging in to computers around the world, which felt very much like being there. That made switching from one site to another feel like teleporting instantly from one continent to another.

These days, the content about a place, versus the server serving the content, versus the location of the many cached copies (e.g. Akamai), and so forth have blurred "where am I" so much as to be meaningless and not even contemplated anymore. But in the early days, there was a real sense of "I am in Antarctica now!".

[1] "I am a Strange Loop",2007, Hofstadter, Basic Books

http://www.kinderwijs.nl/artikelen.asp?postid=188

[2] http://www.usap.gov/videoClipsAndMaps/spwebcam.cfm

[3] http://www.radioaustralia.net.au/

Saturday, January 5, 2008

Shakespeare is (not) Shakespeare

[Ed. Note. This is part two of a series found on "Existential Programming, the blog": "A Rose is a Rose is (not) a Rose"]

In the early part of the book The Stuff of Thought by Steven Pinker, the problem of what-a-name-names, is explored with the example of Shakespeare. Pinker distinguishes between Shakespeare: the historical figure, and Shakespeare: the author of numerous plays like Hamlet attributed to Shakespeare.

In my earlier post, it was somewhat easy to see that there were multiple aspects to Superman because each aspect already had its own name; Superman vs Clark Kent. With Shakespeare however, it is much more subtle because the different aspects have the same name: Shakespeare. Additionally, we are not used to thinking that they are different aspects that can be independent of each other, any more than we think of Cher-the-person and Cher-the-singer as being independent things. But, as discussed in the book, many people over the centuries have debated whether the author of Hamlet, et al was really Francis Bacon, Christopher Marlowe, Queen Elizabeth, etc.

The interesting thing is that because Shakespeare is SO ingrained as the name of the playwright that even if Sir Francis were to be proven the author, the headline will be "Bacon is the REAL Shakespeare!" which is absurd because clearly, Shakespeare-the-historical-figure is the "real" Shakespeare. Changing the human associated with the author-of-Hamlet concept will not change the concept's name; it will remain "Shakespeare's Hamlet (written by Bacon)" and not "Bacon's Hamlet".

So, when assigning ID#(s) to putatively single entities, flexibility should be built in to allow ad-hoc collections of attributes of any entity to be grouped and named and referenced separately. Otherwise, the system would not be able to represent the statement: Shakespeare is not Shakespeare, Bacon is.

Saturday, December 22, 2007

Clark Kent is (not) Superman

[Ed. Note. This is part one of a series found on "Existential Programming, the blog": "A Rose is a Rose is (not) a Rose"]

It delights me to find out that what I thought had been a particular nugget of wisdom, specific to building Identity matching computer systems, actually has a deep principle at work. While working on one of these systems, I learned the strategy of NOT merging all variations of an individual identity's name/address/phone/etc into a single canonical version. It turns out that the need to keep, and assign a unique key to, every variation of identity data (as opposed to only the "canonical" one) has deep roots in language itself...

While reading the Intellectual Devotional (which I highly recommend), I came across its page about "Philosophy of Language" and it had an immediate resonance with a project at my current client. The page describes the "problem of reference" where ideas about what a name "means" have been debated and changed over time.

One theory says that "names" don't have any meaning, in and of themselves, they merely refer to some thing that has meaning. Hence, Shakespeare's quote "A rose by any other name would smell as sweet" summarizes the position that the word "rose" is not meaningful, and could be exchanged with any other word that refers to the thing "rose". That is why "gulaab" (the Urdu word for rose) can work just as well for speakers of Urdu.

Another more modern theory though, says that names not only refer to some thing, they also carry the connotation of "in what sense" is the thing being referenced. The book illustrates the example of Superman and Clark Kent both being names for the same thing (the being Superman), but they are not interchangeable. Clark Kent (mild mannered reporter) has a work address of the Daily Planet whereas Superman (superhero able to leap tall buildings) does not. It matters which name is used when talking about Superman.

So, in the same way that Clark Kent and Superman both refer to different aspects of the same entity, and are thus not interchangeable, a computer system managing legal entity identity data can not translate name/address variations into a single entity ID# when those variations actually refer to different aspects of the entity. For example, if there is data that is specific to a particular store branch, that branch needs its own well-known ID# even though it is only a portion of a single legal entity. Further, since legal entity names are not unique (I own two different corporations with the identical legal name), the entire name/address/phone/etc combination needs managing rather than separate "alternate name" lists. It is also not sufficient to support alternate name/address records merely as search aids that still ultimately result in the ID# of the entity-as-a-whole. Otherwise one would loose track of the fact that we were talking about Clark Kent, not Superman.

Friday, November 30, 2007

Odometer Game Redux

Well, after 35 years of pondering what I thought was an abstract mathematical puzzle, my "odometer game" has found a real-world application!

It turns out that my notion of "remarkable" numbers [i.e. numbers that are so remarkable that if the driver saw his odometer sitting on that number he would either honk his horn or point it out to his passengers] are just the ticket for finding "fake" ID numbers.

My current contract at a major bank found me looking for suspect ID numbers, Tax IDs, phone numbers, etc. in various customer databases. The bank employees entering this information would often get around the fact that these fields were "required" via entry of syntactically legal digit strings that were none the less meaningless. After viewing a few of these it quickly became obvious that they were related to my notion of remarkableness. Actual values found included: 0, 121212121, 000000000, 999999999(9), 111111111, 111111112, 222222221, 888888889, 188888888, 0999999999, 589999999, 255511555 (?)

So, rather than an explicit list of IDs to put on a watch list (as I was asked to find), it became clear that a better answer would have been to use an evaluation function that reported the remarkableness score for each value. A cutoff point could then be established to filter out suspicious values. Alas, while I have casually pondered the mathematics involved in scoring the remarkableness of a number, I've never actually tried to program it. But, now it has become more than an obscure puzzle, and shows signs of having "real world" value!