Monday, September 1, 2008

Power-law distributions in empirical data

This paper on Power-law distributions in empirical data by Cosma Shalizi et al will be required reading for me at some point in the future. They present an analysis and some new procedures of how to reliably detect whether a power law is in play in a particular empirical dataset as many of the simple and naive methods are flawed/biased.

Worrying trends in Econophysics

I enjoyed reading the analysis in the following paper Worrying Trends in Econophysics.

I am particularly intrigued by the role of production (actually reading their description of exchange vs production really cleared those concepts up in my head and made me feel like it should have been obvious from the start but I hadn't conceptualised them as such until now) and what they mentioned of Sraffa. Having just looked at his Wikipedia page I want to read more about the Cambridge Capital Controversy and the following post over at the Mises institute which gives an analysis of Sraffa's ideas from the Austrian point of view.

I found the link from this blog post by Cosma Shalizi on Why oh why can't we have better Econophysics but I haven't had time to read that yet.

Wednesday, August 6, 2008

Google and the AI

I watched this talk yesterday by Kevin Kelly on "The next 5000 days of the web". He once again confirms that Google wants to build an AI and he presents his vision of the web's future as one global machine that everything will be connected to. I agree with him but relating to his first point, that no one would have imagined today's technology to be possible 5000 days ago, I think that his predictions don't go far enough.





Something I'd like to tie in with this is George Dyson's article at The Edge from a few year's back titled Turing's Cathedral. He also talks about how Google is building an AI and how their search technology is in some sense the next level of computer technology by providing semantic indexing(to an approximation at least and I would think that it's continually improving).

All this reminds me of The Technological Singularity. I think it is quite inevitable and is the next step of evolution. I must say that I'm frightened by it to some extent. The future is always opaque but so far humanity has been the most organism on the planet (I think that holds true although perhaps some large companies or the world economy as a whole might also compete for this title) but with the approach of the technological singularity, which I believe will involve the creation of some sort of AI, we will lose that privileged position and that will be a big adjustment for humanity to deal with. I don't think it is necessarily bad and I think it probably is the next step in evolution so it might well be inevitable but I can't help but wonder what our role will be in this brave new world?

Most people I talk to aren't particularly concerned about this. They've never actually said so to me but they remain kinda quiet and don't seem to know what to make of the information almost. I guess perhaps for them it is difficult to conceive of an intelligence bigger than yourself that you are just a part of. For example I see the economy as a fairly complex information processing machine. The analogy I like to use here is to think of humans as cells and the economy as the human body. Each cell just carries out it's own function quite unaware of where it fits into the body (at least I assume so, I don't know very much about Biology) but together they form this complex thing that is a human which can move about and even reason about it's own existence. Similarly even though we only go about our daily lives motivated by money or friends or whatever our personal driver is, we contribute to this bigger organisation whose logic and intelligence doesn't sit in the individual units but rather in the connections between them and the institutions we use (I haven't thought through the details here because to me it seems sufficient to capture the idea that I wanted to express but I assume one could make the analogy better). For me the intelligence sits somewhere in between the actors, in the connections or the rules of the game which makes it a tough thing to get because it's not localised anywhere so that we can point at it and say "You see, and that's where the intelligence is."

I saw my first glimpse of this when I read "The Emperor's New Mind" by Roger Penrose in the passage where he describes John Searle's Chinese Room thought experiment. This was many years ago now so I'm not 100% sure anymore what he was trying to say but my basic recollection of it is that Searle was trying to argue that apparent forms of intelligence might represent consciousness, so for example a machine that could translate Chinese into English perfectly and would pass some sort of form of modified Turing test might not really have any particular intelligence beyond being able to do this task of translation. The way to picture this he said was to think of the translation algorithm as a set of instructions given to a man locked inside some room. All he receives from the outside are pieces of paper that have chinese symbols/sentences on them which he doesn't understand one bit. He then applies his instructions/the algorithm to these symbols to turn them into English letters, words and sentences which he returns back to the person outside the room. We ignore questions of time and practicality since this is a thought experiment. To the person on the outside it might appear that this person speaks perfect Chinese but we know of course that he is just mechanically following his algorithm and has no knowledge of Chinese whatsoever. What this showed to me however is not that this clearly shows there is no intelligence here but rather that we should not look for the intelligence in the execution unit (the human in the room in the thought experiment) but rather that the intelligence sits in the algorithm and becomes manifest in the execution of it. Of course if we look for consciousness it gets a little trickier but I'm not even sure what consciousness is and it could well be just special form of intelligence about observing oneself or ... look I don't really have a clue and don't want to get into that here.

I think what is rather startling about the previous example is that it's very difficult to localise the intelligence. I would say the algorithm is the intelligence but an algorithm is such an abstract entity without any real physical form. I mean we could represent it in spoken language and then store it in the man's brain, or we could write it on paper or perhaps we could even encode it in a system of knots on strings or some other form. The algorithm itself is abstract but we interact with it through physical representations. The algorithm is almost like one of Plato's perfect forms that exists in a world of ideas and not our world. So to exist the intelligence exists "inside it" is difficult to get one's mind around. But I think that's how it is or at least how I see it.

Now to tie this all together. I think the web is a giant intelligence much like Kelly said in the video and even though at the moment there is no HAL that we can point at and say that's where the intelligence sits, it is inherent in all the information and the connections and the structures that we have put in place and it probably has already surpassed the intelligence of any individual human. I think it is the process of evolution of continually assembling systems of evermore complexity and we were just one step along the way.

Tuesday, July 22, 2008

Gamma

I liked this story (http://recursed.blogspot.com/2008/07/rutgers-graduate-student-finds-new.html) about a Rutgers grad student discovering a new prime generating sequence.

Tuesday, July 15, 2008

Epsilon

I found some posts on this tags issue

Tags: Database Schemas
http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html

and some performance tests on the above schema

Tagsystems: Performance tests
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html


The way I thought about doing it in normal SQL with joins was basically the second approach, having it mostly normalised but just using the tagname as the tad id.

Now, the question remains of how to do this in BigTable without any joins? The full text search schema still remains but I don't think that's a scalable solution so I'm not really interested in that approach.

Monday, July 14, 2008

Delta

Some links for today:

GeSHi Demo
Useful for posting highlighted code to the blog.

Hacker News @ Y-Combinator

Why you should play Go

Sunday, July 13, 2008

Gamma

In terms of my tag intersection problem I came across httpmr which is/aims to be a MapReduce implementation over http.

Beta

My second post.

As I've been hiding behind a rock for the last few months I only got wind of Google App Engine recently and have been looking at it over the weekend. It looks quite fun and I'm quite tempted to play around with it and write a simple app to test it out.

In particular I'm trying to get my head around their datastore and how to use it efficiently.

One particular problem that I just started thinking about is how you would do an intersection of sets given the lack of joins. Look at tags as used by gmail, Flickr, del.icio.us or any other such app and for concreteness I'll talk about a gmail type example. How do you find all emails that have a given set of tags without pulling in all the records for each tag and doing the intersection in memory? This might not be a problem for an email app where each account has a relatively small number of objects but what would you do if you had a bigger app?

This is a standard problem so I'm sure it has been solved efficiently many times and I hope that I'll be able to find an answer out there if I don't find one myself first. I started thinking about this last night and think I have a solution which I'm busy coding up in Python along with the traditional relational db approach for comparison. Hopefully I'll be able to post this here tomorrow.

Alpha

Howdy there

I hopefully will start putting some thoughts down here as I go along. I acknowledge that this has a high probability of becoming another blog stub, like one sees so often on the internet.


What's in a title?

I thought it was a suitable title for a beginning... and this is the beginning...in more ways than one.

It's the beginning of my blog.

It's the beginning of my Linux career. Actually more like a renaissance. I used to use Linux before but haven't in about 10 years or so. I also never really had my own Linux box where I could run root commands and actually install stuff. With my newly acquired Asus Eee I've been forced to get reacquainted, this time with the Xandros/Debian flavour, and so far I'm loving it!

I'm looking for a new job so I'm also refreshing my knowledge of C++ and learning some Python along the way as well.

Finally, Alpha, Beta and Gamma all have meanings in Quantitative Finance and this was a title of a blog I wanted to write about 3 years ago explaining these concepts and what I had learned about them. That's not really my focus right now but I might pick up these topics again once I've taken care of my current more pressing needs.