Thursday, December 30, 2010

Model Trains

Continuing from the previous post on using PageRank to rank stations by 'popularity', and with the help of Gephi - which makes it ridiculously easy to built network graphs and work PageRank out - I built the Virgin Pendolino lines, and they look like this
I then worked out PageRanks and picked a route I'm failiar with - the one that goes between Coventry and Birmingham New Street (the brown line).


Old Model

Going back to the equation I outlined in the first train post, the number of passenger on the train at station i is modelled by
So now, we define
Where P(i) is the PageRank of station i, and a is some constant. We then work out 'passenger' numbers for each stop and get the graph below
Here the blue line is number of people boarding, the orange line is people getting off and the yellow is total on board.

And the interesting thing here is that while the on/off lines are quite squiggly, they seem to cancel each other out quite well leaving a fairly smooth curve for total passengers. Which you would kind of expect, since a popular station would have more people getting on AND getting off.

So the question then was, how significant is the wiggliness that's left in the yellow curve.


A Simpler Model

For this, we imagine a train with the same number of stops (10) as above, but where T(i,j)=1 for all stations i,j - i.e. there's only one person going between each pair of stops. And the nice thing about this is that the above monstrocity of an equation reduces to
where i is the current station's position on the route and n is the total number of stops.

And if you multiply that by some constant to get a best fit and plot them together, you get this (orange is the fit)
Which is a pretty damn good fit. Except for the first two stops. And what's weird about it is, you'd expect that the fact that there aren't any people getting off to cancel out the attenuation would mean the ideal curve would under-estimate. But it over estimates. I'm not sure how to explain it. But other than that..


Decent Approximation

And what the close fit suggests is that the 'popularity' of the station isn't as important a factor as how far along the route it is. Which, if true, simplifies the problem a massive amount.

But that first station is still a problem, because that's an error or ~20%. And considering <5% is usually considered the acceptable error limit, it's looking kind of bad.

But you could argue that since it doesn't have a knock-on effect for the later stops, and because the passengers on-board after the first stop is countable on the platform, it's an acceptable 'problem'.

Realistically, I should try the same for other routes to see if the same happens, or else find some real data to test it against. Because it could turn out I'm catastrophically wrong. Or I could turn out to be right. But for now, I'm content.


Other Factors

And that's that. So the only other thing that would affect passenger numbers on a per-station basis is rush-hour effects. This is significant over longer routes mainly, but it's significant bacause it affects the passengers getting on at a particular station differently to how it affects those getting off - this creates a disparity that doesn't cancel itself out like the above.

But at the same time, thanks to the Department of Transport, we do have numbers to work from for that. Then we just need to put together that model and multiply by some number - which can be found by counting, say, the number of people on the platform - and we hopefully have a pretty got estimation of whether or not you'll get a seat.

And I'll finally be able to let go of this madeness. Fingers crossed!


Oatzy.

Wednesday, December 29, 2010

Rush Hour

So this ultimately turned out to be more an exercise in practicing fitting functions to data. But nonetheless, here it is.

I was browsing the Department of Transport website (as you do) looking for data for the whole trains thing. I didn't find what I was looking for, but I did find this
This is, "Passenger numbers: by time of departure from station", and "represents rail travel in Great Britain as a whole, on an average weekday outside of school holidays". The bottom one is a breakdown by purpose of being on the train; the top on is total numbers.

So the first thing you notice is there are curves for commuters going to and coming back from work - rush-hours. Now I'd thought that there would be a minor peak around midday, but apparently not.

So if you wanted to get an approximate model of this graph, the first place you'd probably start would be notice that the commuters' curves are fairly 'normal' (in the mathematical sense.

So working on them separately, and using Mathematica's FindFit function you get your curves. You then combine them and adjust them upwards to include business/leisure and you get this
Which is acceptable - it fits the first half better than the second. But not amazing overall. The equation is this, for those interested
So just for the hell of it, I tried getting a better fit using a Fourier series. This time using the totals, and again, this was using Mathematica. But for a Fourier series you're combining Sines and Cosines.

The accuracy of the fit depends on how many terms you include. I tried various iterations until I found the most acceptable fit with the fewest terms, and that looks like this
Which in all fairness is a pretty good fit. Except for that it's got 15 terms and looks like this
Not what you would call elegant.

So that's that. I doubt anyone's actually interested, but for all the effort I put in to it, I thought it worth 'formally' writing it up. Otherwise it'd be lost forever somewhere on my laptop and in one of my various notebooks.

The other thing is, it could come in handy with the train problem. But that's a whole other story.

Anyway, I'm starting to ramble, so lets just leave it at that.


Oatzy.

Thursday, December 16, 2010

Weeping Angel Christmas Tree Topper

Template here, in case you fancied making one of your own,
Feel free to modify, redistribute, whatever.

For mine, I reinforced them with card, then in true Blue Peter style used a toilet roll tube and double-sided sticky tape to make the shape of it. In case you couldn't tell, they go back to back, so you just have to turn it to see whichever side.

Or you could do it as a hanging decoration like Andrew did.

For maximum effect, put it on the tree without telling anyone ;)


Oatzy.

Tuesday, December 14, 2010

Follow Up: Facebook Friends

Every freaking time! I do something, like graph my Facebook friends, and I'm one-up'd by Facebook going and doing this,

Hardly seems fair, since FB has the data to plot everyone. And geographically. But there you go.

Their explanation, and a hi-res version of the full image here.


Oatzy.

Sunday, December 12, 2010

Facebook Friends

They look something like this
There's 167 of them - a little over Dunbar's number, for what it's worth - shaded by number connections. And you've got this massive clump at the bottom and the rest more spread out. And you can split them into 3 major groups,
with everyone else just sort of scattered. The groups are only approximates, by the way; there's overlaps, outliers, omitted nodes, etc. I didn't include the name labels for clarity.

Or you can lay it out in a more circular way,
and again, you can see those same, vague groupings.

So that's all pretty and such.


Want You Own Graph?

1) Go to Facebook, and run the app netvizzz
2) Click "Create a gdf file from your personal network" and download
3) Download, install and run Gephi
4) Import your gdf file, play with layout, colour setting, whatever.
5) ???
6) PROFIT!

Fairly straight-forward.

And as a side note, I wish I'd known about this program sooner. Then, I wouldn't have had to work out PageRanks by hand.


Oatzy.

Wednesday, December 08, 2010

TrainRank

Or StationRank. Either way, it's a misnomer, since PageRank is named after it's creator, Larry Page. But I digress.

Anyway, the idea is basically this - going back to the previously mentioned train problem, in a moment of inspiration it seemed worth trying working out PageRanks for UK train network; this hopefully correlating somehow to where passengers on a given train are likely to be going.


Stations

First of all, there is a total of about 2,518 train stations in the UK, and damned if I'm going to (or even could) work out a PageRank for the entire network. Even if you only use the Virgin CrossCountry routes, you're still working with short of 100 stations. So for this I used the major stations on the CC line. There's about 30. Major stations, by the way, are the ones with a big circle on the map below
Obviously the simplification has an effect on the results. I compared to station use numbers for those stations included and got a correlation coefficient so close to 0 as to make no odds. But you have to bare in mind the use numbers include all train lines going in and out of each station (not just the Virgin CrossCountry lines).

Anyway, for demonstrative purposes it's good enough.


Ranks

So the TrainRanks are as follows
And at the very least, it seems to fit alright with my experience - my experience being limited to traveling between Sheffield and Coventry.


Random Trains

So what does it mean? For websites, it's based on this idea of a 'random surfer' clicking random links, resulting in probabilities of the surfer ending up on any given page.

So by analogue, we assume a train that moves randomly around the network; and that includes randomly changing direction and taking routes that wouldn't, in the real world, be valid. We then imagine a passenger on this slightly erratic train - a station's TrainRank is the probability the passenger will get off at that station.

Or alternatively, if there are 100 people on this train then, for example, about 7 of those people will get off at Birmingham New Street.

Now obviously, there are some problems with that definition, the most obvious being that that's just not how trains work. Similarly, I don't know if or how this would fit into my previous model. But it's interesting to consider, nonetheless.


Simplifying Routes

It does seem to make sense to limit the ranks to given networks, since passengers have to get off the train of one network to leave the station or get on a train for another network. But at the same time, what other networks call at a given station may have an effect on the probability a passenger will get off the train at that station. Maybe.

But on the other hand, it doesn't make sense to simplify to route level, since routes being straight lines, their TrainRanks would probably end up forming something close to a normal distribution - the middle station having the highest probability, and decreasing towards the ends.


Just some thoughts, anyway.


Oatzy.

Tuesday, December 07, 2010

Normal Icicles

Long story short, I was playing with my dad's fancy camera and took this (amongst others) photo of some icicles

Which I was pretty proud of. So proud in fact that I sat and stared at it for longer than is perhaps sane.

As you'll notice, the individual icicles are clustered into groups, with the longest of the groups in the middle and getting progressively smaller moving outwards.


Normal Distribution

The normal distribution is one of those things that pops up everywhere. For example if you measure the heights of a significantly large group of people and plot the results you'll get a graph shaped something similar to this
With the averages approximately in the middle, and the width/height of the curves based on the standard deviation of the sample.

Another nice example, go to Amazon and look at the customer ratings for anything. If a large enough number of people have rated it, you'll probably notice this same sort of shape; usually with the one star rating spiking to not fit the pattern (damn hipsters). Some are more pronounced fits than others.

And as I say, this sort of thing shows up all over the place. This is partly due to the central limit theorem, but that's a whole other kettle of fish.


Equations Everywhere

I was partly inspired by a program I recently watched - The Joy of Stats - and partly by the website "Found Functions", whose creator finds graphs (and their accompanying formulas) in photos of everyday scenes.

I flipped the photo vertically (for clarity) and skewed it slightly to try to account for the fact that it was taken side-on, then put some normal curves on it to hopefully prove I'm not just crazy and imagining it
And another one

They're not perfect fits, partly due to perspective. But you hopefully get the idea. Why do the icicles form like that? Because nature's just fantastic in that way.


Oatzy.

Monday, December 06, 2010

PageRank

In terms of getting some idea of a person's 'influence' on Twitter, working out (a variation on) PageRank is more effective that counting followers and followers of followers, but much less effective than using HP Labs' modified HITS algorithm.

But to count followers of followers I'd need to put in more effort than it's worth, and to do HITS I'd need to scrape an arse load of tweets to count who retweets who and how often.


PageRank

So PageRank basically measures how many people you're connected, as well as who those people are (connectors and all that). It's most associated with Google who use it as part of their search result ranking algorithm.

In this case we replace pages with users, and for links we say Alice following Bob is equivalent to page A linking to another page B.


Another Updated Graph

Working out PageRank was easier than the other two for one simple reason - I already had a lot of the data I needed, from previous network graphs. Obviously I had to update it first, and this is the graph as it stands (without me)

Which has become almost inexplicable more complex between the last one and this one. But there you go.


Results

Anyway, I followed the algebraic approach from the wiki article (looking like the easiest for what I had) and the results, in rank order, are as follows:
[When looking at the ranks, it might help to make sense of them more if you refer to the graph above.]

You have to bare in mind this particular PageRank calculation is only for my friend network, and assumes it's isolated from the rest of Twitter for simplicity. To get everyone's proper PageRank you'd have to analyse the whole of Twitter.


WTF?

The only question now is, what do the results actually mean? I'm not entirely clear on that.

In the case of websites,
PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page
I think in the Twitter case it's more to do with how tweets spread and who in the network is more (or less) likely to see them.


If I ever work it out I'll let you know. Otherwise, feel free to offer your own explanation.


Oatzy.

Saturday, December 04, 2010

Quick Look: Tumblr Reblogs

The nice thing about Tumblr, is it's kinda like Twitter in that you can repost something someone else posts; but unlike Twitter, it's easy to keep track of what's going where. And on top of that, things tend to spread further.

Basically, if you go to the page for a particular post, under 'notes' is a list of every like and reblog that post gets. So all you have to do is copy/paste all that into a text file, and load it into Google Refine (or spreadsheet, if you're so inclined); parse, rearrange, tidy, so you have two columns - reblogger, reblogged - load it into ManyEyes and you can make a network graph for who passed the picture on from whom.

Looking at the graph, you can then divide a post in to one of two groups:

1) Self-Centred

The example for this is a photoshop I did putting varies memes (that were big at the time) in to one "Ministry of Silly Walks" picture.
And the graph looks like this, with me at the centre

2) Fan-Centred

For this, the example is this Doctor Who pic - my most reblogged picture - which I actually stole from else where on the web (which is quite common for Tumblr)
And the graph in this case looks like this, with a fan page (note the name) being at the centre of most reblogs and me (red circle) pushed to one side.


Anyway.

None of this is actually important. Vaguely interesting maybe. It's basically what I'd've done a while back with retweets on Twitter, if Twitter would facilitate it. Why do things spread further on Tumblr? Damned if I know. The average retweet apparently only goes about two jumps from the originator.

But again, this is another good example of connectors and all that - how passing something through the right person/people can make it explode in popularity and spread massively.

In the Doctor Who one, "timelordian" only passes it on to one person from me. But if they weren't there, it might not have reached "thetardis", in which case, the bottom half of the graph would disappear. So again, you don't have to pass something on to a lot of people to have a massive effect. You just have to pass it on to the right person.


Oatzy.

Sunday, November 07, 2010

Follow Up: Twitter BFFs

Last time I tried to do this, I tried to use as large a range as possible for the data and found some people missing. I still don't know why this is. But coming back to it, I tried looking at just the previous week, and the missing people reappeared.

I took the numbers for the last two weeks. I don't entirely trust them, but as always I try to work with what we can get.

[edit] - I just checked the numbers again, over the same period, and the numbers for follower replies have changed fairly significantly. This displeases me.


The Numbers!

Using the method outlined in the previous post, the fifth column is the geometric mean of mentions (assuming a two-way connections between myself and all listed). For comparison, the fourth column is the arithmetic mean. The numbers are fairly similar.

As I said last time, I can't adjust for talkativeness. It's possible the picture would be dramatically different if I did have the numbers, but I guess we'll never know.

The sixth column is just the geometric means as a percent proportion of the total sum of the geometric means.This is what the names have been sorted by (in descending order).

[I'd make a pie chart, but frankly, I can't stand them.]

[Side Note] - the correlation coefficient of the replies is 0.62 (on a scale of 0 to 1).

tl;dr 

So there you have it. You can take the first three or five names and call them my top friends (or BFFs, if you're so inclined) for the last two weeks.


Connection Graphs

Firstly, here's a table of how many friends I have in common with friends.
As I said last time, you could incorporate this into the "BFF score". I didn't, but it's there in case you're interested.

Aside from that, here are a couple of coloured network graphs.

First, men in blue, women is red. No immediately obvious patterns.
Second, people I know in real life in red, web people in blue.
This one's slightly more interesting in that the IRL friends are all clustered together in the middle with all the web friends spread around the edge. Which you might well have expected, but it's nice to be able to see it.

Good.


Oatzy.

Thursday, November 04, 2010

NaNoWriMo

Or National Novel Writing Month, for those unaware and curious.

Basically, for the duration of November you attempt to write (at least) 50,000 words worth of novel (~1,666 words a day). I considered doing it last year - which would have made more sense, given I was on a whole "I want to be a writer" thing - but I didn't for some reason. What's different this year? Mostly the competition. Nothing motivates me more than competitiveness :D

Hurray for Twitter.

A novel, by their definition is a long work of fiction, and under that definition a collection of short stories can count if they share a common theme. Which is good, because I could not think up a novel worthy plot. Plus I'd probably lose interest and give up part way through.

I had a collection of old ideas (and one that came to me in a dream a couple of weeks ago) but had to come up with some dubious link.

Anyway, I chose mental disorders.

But not in a "this person has this condition, look how weird they are" sort of way. More in the way that Fight Club is 'about' Dissociative Identity Disorder, how Memento is about Anterograde Amnesia or how Truman Show could be about paranoid schizophrenia (think about). It's more a plot device. Or something like that. I'm not savvy to the literary lingo.

I called it "Twisted Smiles & Fractured Minds" partly because it fit, but mostly because it sounds cool.

The stories so far are:

Day 1) His Missing Face - Body Integrity Identity Disorder (based on the dream)
2) My Beautiful Eyes - Narcissism, and the opening to a series
3) Prologue - a flowery vignette (see below for extract)
4) My Imperfect Rock - continuation of narcissism, with mild codependency
5) My Broken Mirror - either Anorexia or Bulimia. Possibly. The third in the narcissism series.

(Titles subject to change)

I'm being vague in the off chance I let you read any of them. I don't want to give any spoilers :]

Much beyond that, I'm going to be scraping the DSM-IV and wikipedia for ideas - which is another bonus to having mental illness as a 'theme': plenty of potential jumping off points. Also, mental illness is just plain fascinating.

I am finding, though, that the ideas seem better in concept than in execution. Not that that's too much a problem, since quantity matters more than quality in NaNo.

You can find my profile here, and watch my word count if you're so inclined. Or, since the site's a bit slow and unresponsive at the moment, you can see the 'novel' summary and an extract in the screenshot below.

NB/ It's not all written in the style of the extract.
Click to embiggen. Cover art uses this stock image. Font based on my handwriting. It could be better, but it'll suffice.

And if you're also doing NaNo, feel free to add me as a buddy.

Whether or not you'll ever get to see what I write for NaNo will very much depend on how I feel about them once it's over and whether or not I can be bothered to edit them to bring them up to an acceptable standard. I'm a poor judge of the quality of my own work and usually end up hating it, so that may never happen.


Oatzy.

Monday, October 18, 2010

Twitter BFFs

Mentions

So if you wanted to make a very basic BFF app, you would probably take the means of the numbers of mentions between a user and each of their followers, then pick out, say, the 5 highest.

So maybe define R(a,b) for two users as the number of times 'Alice' mentions 'Bob', and R(b,a) vice versa (assuming that both counts are taken over the same period of time).

In this instance, I would probably go with the geometric mean,
since this would weed out any one-sided relationships by making the result zero.

You'd probably also want to factor in whether they're following each other, so define
and multiply the mean by it. Or, in other words, make it zero if they don't both follow each other.

And that would suffice.

The only major downside to this is - imagine a pair of people who talk a massive amount, but all of what they say to each other is abusive. BFF wouldn't be the best way to describe them.

Unfortunately, unless you add "sentiment analysis" to the app - which is a little hit and miss at present - you're just going to have to assume everyone talks nice to each other.


My BFFs

Now, this is another situation where the best numbers aren't readily available - those ideally being:
1) date started following
2) total number of @replies since then

This obviously raises the same problems I talked about in the previous post. But for demonstrative purposes, I'll use the number I can get.

So for this, I'm using Twoolr, over the period Sept 1 - Oct 18 (48 days).

Now a problem arises here. Twoolr seems to be missing a significant number of people for the second column, who I know for a fact have @replied me during that period. I don't know what's going on there, and unfortunately there's not much I can do to help it.

So the last column isn't really much use. But it could suffice to take, say, the top 5 and call them my Twitter BFFs over that 48 day period, if only so we have something vaguely resembling an answer.

Or alternatively I can just go here, click on 'closest friends', and get this:

Covering about 2 week. But then, that particular site's numbers haven't been entirely reliable in the past..


Accounting for Rates

Given that people tweet at different rates - and may talk a lot with almost all their follows - it would probably make sense to take this into account.

For this I would probably say take

i.e. the proportion of Alice's total @replies that are directed at Bob. So for my numbers above the total is 525, and you'd divide everything through by that.

So then you can work that out for R(a,b) and R(b,a), take the geometric mean as before, multiply by 100 and your result should be a value between 0 and 100 vaguely indicating how 'close' Alice and Bob are.

Which sounds ridiculously straight forward. But that only works if you can get all the numbers. But that would require authorisation from both users in question, which in practice isn't easy.

Incidentally, this is why people are so desperate and willing to buy, sell and steal your personal data. It's valuable! Especially to marketing people.


Improving the Results

So there are other things you might want to include in the calculation to improve the 'quality' of the results. That is, there are other things we can measure that are also good indications of friendship and closeness.

Again, the absolute ideal way of working out how good friends people are would be to go through everything they send to each other (and not just on Twitter) to look at what exactly they're saying. But that's time consuming, and people tend not to like it when you invade their privacy and analyse what they say, to that great a degree.

1) Network

Basically, how many friends Alice and Bob have in common. You could even go so far as to work out the biggest 'clique' the two are part of - that is, the largest group Alice and Bob are in, such that everyone in that group are friends with each other.

So in general, you'd expect Alice to be closer friends with Bob than Carol if Alice and Bob have more friends in common - or are part of a large clique - than Alice and Carol.

I updated the network map (old version here) to clarify this point, and also just because I can.
[click to embiggen. Interactive version with me here. Version without me here.]

2) Retweets

This tends to be better at showing how interesting a user finds another's tweets. So if you did decide to include it, you probably wouldn't give it much weight.

But at the same time, one would like to believe that - if for example someone you liked and someone you were indifferent to tweeted the same thing - you'd be more likely to retweet the person you liked. So it's mildly worth considering.

3) Follow Friday

Like retweets, this can indicate how interesting a user is as much as it can indicate friendship. Ideally, you'd want to consider the reason (if any) accompanying the #FF tweet.

4) Lists

This is another one that's only useful if you can interpret how the lists are defined. Being on a list called "arseholes" for example, doesn't exactly suggest friendship.


But as I say, if you don't want things to get over-complicated, you can just ignore all the above.


The Friendliest Tweeter

If you want to be really clever, you could work out the numbers for all of a user's followers, then combine them (in some way) to get a 'friendliness score' for that user.

Which is nice, because it would give people a way to compete over who's the friendliest (with their follows/followers).

Possibly more on that in a later post.


And yes, the title was a bit of a cheat since I can't actually tell you who my Twitter BFFs are. Sorry.


Oatzy.

Friday, October 15, 2010

Thoughts on the Feasibility of Pokémon

If I were any good at drawing, this would be illustrated. But I'm not, so it isn't.

First of all, if Pokémon were made real, battling them would pretty much be out of the question, since it would likely be considered animal cruelty (akin to cock fighting, etc). Plus, you'd have lots of Daily Mail readers bitching about how genetic engineering is unnatural or whatever. But for convenience, I'm ignoring these facts.

Second, I should point out that I'm not a biologist, so if I get anything wrong - Mrs Elston, Mr Bestford, I apologise.

All this will assume current or plausible future technology. This is not to say that some of the things I may dismiss will always be impossible. You never know really.


So there are three key things you need to work out if you're making pokémon:
1) aesthetic correctness
2) "evolution"
3) "moves" - whether they're possible and how.


Possibility

First and for most you have to ask, is a particular pokémon possible at all?

And for this I mean like, for example, ghost pokémon - no, ghosts don't exist. Pokémon made of solid stone or steel - no, I can't really see how that would work.

But aside from those, there are some that would be possible if you made minor (or slightly major) changes.

One example - pokémon that are perpetually on fire. Aside from health and safety issues, I can't see how you'd make it work. Especially for Ponyta and Rapidash - a horse with a flaming mane? I can't see it ending well. Unfortunately, if you take away the fire, they're pretty much just horses.

Similarly, things like Blastoise - the giant turtle with the retractable metal water cannons in it's shell - definitely needs some tweaking. Maybe if they were fixed, bone-based cannons that were some how attached to something akin to a blowhole?

With regards steel/stone pokémon, it could work as either (pseudo) sentient robots for the likes of Voltorb, Magnemite, etc; or in the case of say Onix, maybe a large snake with a hard, rock-like armor. And similar for other rock pokémon.

But these tricky ones aside, there are plenty that are quite plausible.


Creation

There are two methods of creation, as it is. In the cases of (mostly) aesthetic variations of pre-existing animals, you could get away with selective breeding, at least to a greater extent.

This would work best in, for example, some bird pokémon like Pidgey, Swellow, Starly; or some reptiles like Ekans, Arbok, Kecleon..

And in some cases, like Eevee - setting aside the whole evolutions thing (more on that later) - you could quite easily have an Eevee by just buying an appropriate breed of dog (pomeranian?) and grooming so as to give it the correct appearance.

Similarly for things like Teddiursa and Ursaring that need little to no change from pre-existing animals.

In terms of markings, some may be harder to breed for definite recurrence - i.e. there'll be a lot of randomness in specific markings. And there are some colours, or fur in particular, which I'm not entirely sure at possible naturally.

For ones that don't significantly resemble existing animals, but resemble extinct animals such a dinosaurs, it'll be a wait. Beyond that, you may struggle.

Things like Bulbasaur, I don't know how you'd graft a living plant onto a reptile - unless of course it's not actually, physically a part of the reptile, but has a symbiotic relationship with it. Somehow.

Anyway. I could go through all the pokémon, but frankly I'm not going to. 600+ pokémon? Pfft!

Maybe some other time.


Evolution

First of all, the pokémon definition of 'evolution' is quite different to Darwinian evolution. It's wrong for one thing. But that's incidental.

The whole evolution thing is a pretty big part of pokémon. But I'm just going to go ahead and say this - dramatic, instantaneous metamorphosis is ridiculous and impossible.

Realistically, it has to happen over an extended period - so for example, with the Pidgey evolutions the change would work gradually, and you'd have intermediary phases between it and Pidgeotto and Pidgeot. And this would work for a lot of pokémon, albeit with a little creativity.

Insect pokémon, at lot of them mimic real insects pretty well anyway. They just speed things up, missing out the gradual change.

Obviously, this evolution would be time-dependent rather than as a result of 'experience'. Which does take the fun out of it slightly, since it would make the evolution of you pokémon feel like less of an achievement.

Plus it would mean that your little sister could have a Gyarados just by catching a crappy little Magikarp and waiting. And besides that, Magikarp's evolution would be less impressive if it were gradual. Unless some sort of cocoon like deal were used.

The more 'magical' means of evolution - i.e. stones, trading; you might be pushing it.

I suppose, in the case of stones, they could trigger specific genes to activate - either by being radioactive, or some such crap - and again, over time, that would lead to a particular final outcome. Trading, I can't see that ever having an effect. Certain held items, pretty much the same deal as the stones.


Battling and Abilities

Okay, so assuming your pokémon doesn't win a battle, sustaining only minor injuries (if any), in the real world there are really two outcomes to a battle:

1) Unable to continue - either by being knocked-out, comatose, crippled - and I mean genuinely crippled. These things could potentially be healed over time - but not as instantaneously as the Pokémon Centre does - and odds are in most cases they'll never be able to fight again.

2) Dead.


Normal, Fighting, Flying, Rock, Steel, Dragon, Ground, Bug,.. - pretty much any physical attack, fair enough. It may take some training or may have to be taught, but still.

Certain attacks may not be possible for certain animals. Whatever. It's straight-forward enough.

Most defensive attacks are pretty straight-forward as well. Falling asleep in the middle of a battle - unlikely. And I doubt the sleep's restorative power would be worth it.

Special attacks. Now:
Psychic - might be vaguely possible in the future. Maybe. But that might be pushing it. Would likely be in the form of some implantable technology.

Electric - most plausible of the specials, given that creatures such as electric eels already do this sort of thing, albeit short range. You might be able to enhance that.

Water - possible, in so far as it's possible to manipulate any water that may be around you. Out of water, you might have a problem, unless they can store it somehow, or you just carry bottles of water with you.

Ice - similar to above. You can manipulate surrounding ice/snow, but I can't think how it'd work away from snow.

Fire - not that sure. Suggestion have been put forward for how fire-breathing dragons could work. Maybe by storing naturally produced gases (methane, for example) and creating some sort of spark using claws or teeth or something as a sort of flint. Hell, hold a lighter in front of a cow's mouth and wait - fire-breathing cow! So yeah, it's mildly possible.

Grass - again, it's manipulating your environment, or possibly using any foliage that might be a part of your body.

Poison - fairly self-explanatory.

Ghost - no. Non-physical Dark - no. Other specials of the more physical attacks - might be pushing it

And again, all of this comes down to some genetics, and some training (as appropriate).

Abilities, some are plausible - such as having good balance, being flame-resistant, etc - while others are slightly less so - like levitating.


Bonus: Shiny Pokémon

These are Pokémon with alternate colouration - a minor aesthetic difference. So realistically, while we're genetically engineering them, we can give then a recessive gene, or else a set of genes prone to mutate in rare instances - that will cause the change in appearance in approx. 1 in 8192 individuals of a species.


Technology

Pokéball - for the foreseeable future, out of the question. Not until (or unless) we develop transportation and similar derivative technologies. I wouldn't hold your breath.

Obviously, this presents a problem - in the cases of the larger pokémon, where do you keep them? In a good handful of cases, there are Pokémon too large to keep even in your back garden.

So either make them such that they don't exceed some size (say, some zoo friendly size) or else try to avoid those which will cause this problem.

Healing - as discussed above, I can never imagine a way of healing potentially fatal injuries in 5 seconds or less. We're talking standard veterinary care - albeit in the future slightly more advanced. And the vets would, first, have to be trained to treat Pokémon.

Potions - I can't think of a real world equivalent, besides general curative pharmaceuticals.

Paralyse heal - only if it's, say, hysterical-paralysis or down to muscles tightening or something of that sort. You certainly can't spray a magic potion and heal broken bones.

Awakening - maybe. Though it most cases, just waiting would be sufficient.

Antidote - i.e. anti-venom, sure, but I don't know if an all-in-one is possible.

Burn Heal - Yeah, but it won't be instantaneous.

Ice Heal - again, just giving it time would suffice, or else wrapping them up for quicker results. Any worse than that, your Pokémon's fucked, unless it can produce it's own anti-freeze.

Full Heal, Full Restore and berries - don't push your luck.


Conclusion

So some pokémon aren't entirely implausible. Which is nice. Others, no chance.

As Joe pointed out, there are risks with genetically engineering pokémon. Lord knows we wouldn't want a Mewtwo on our hands. (Or would we?). Although, given all the rigor that goes into genetic engineering, we shouldn't have such dramatic problems.

But it's worth keeping in mind, there is always the risk of unforeseen and unintended consequences. Ideally, you'd want to make them - at least initially - on a Jurassic Park -type island, where they'd be contained with minimal risk to the general public.

And as a final thought, if we keep in mind that scientists haven't made unicorns yet - which is a fairly minor alteration to a pre-existing animal - we might be waiting a while.


Oatzy.


Bonus - various photoshops of 'real' Pokémon:

Saturday, October 02, 2010

Follow Up: The Women Really Do Tweet More!

As Aerliss quite rightly pointed out, taking an average over the last 2 weeks isn't that accurate for anyone who has been uncharacteristically quite or talkative over that period. Problem was, I couldn't get numbers for further back than that.

In fact, it turns out the numbers I would've gotten with the Twitter API, I can get in browser by going to, for example http://twitter.com/statuses/friends/oatz.xml.

And what you can get in there is the date a user joined and their total number of tweets, which means you can work out a 'lifetime average'.

For those interested, here is the code that turns date joined into number of days online.

Anyway. I ran the numbers. Some were surprisingly close to those I got yesterday. Others were wildly out.


The Numbers

What I found was that there are 3 particular outliers - 3 women who are WILDLY more talkative that anyone else I follow.

So for the sake of naming and shaming, and so you can all feel vaguely competitive with each other, here are the numbers:
Sorted by rate, coloured by gender. Click to enlarge.


Needless Complaining

Now, I do still have some qualms with this approach. For example, I joined last May. But there was a period of around 6 months when I never went on Twitter. So if we could exclude that period my average would be more representative of my tweet rate (and higher). That said, my average is still one of the higher ones.

I don't know if and how this affects other people, but there are some cases where the lifetime-averages are dramatically different to the 2week-averages.

Still, it does give me some of the numbers that were missing from yesterday's. So pros and cons. But in all honesty, drawbacks aside, I do trust these numbers more than yesterday's.


The Results

Enough waffling, here are the revised results:
Men:
Average = 5.29 tweets/day
Standard Deviation = 4.93

Women:
Average = 19.21 tweets/day
Standard Deviation = 24.94

So yeah. I wasn't imagining it. The women I follow really do tweet (on average) a lot more than the men!


A Graph

To show the spread - but mostly just because I can - here are the rates plotted on a graph (natural-logarithmic scale). Blue squares are women, orange diamonds are men.
So with the women you have a big clump in the middle and a handful at the extremes, whereas the men are sort of more evenly spread.

Even if you exclude the 3 female 'outliers', the average - 7.28 - is still higher than the men's, although the standard deviation becomes lower than the men's.

If you include me in the men's average, theirs still only goes up to 5.79.

So there you have it.


Random facts

- Of the people I follow, abooth202 has been on Twitter the longest - 966 days - with 5th November 2010 being his 1,000th day.

- miss_popcouture has posted the most tweets in her time on Twitter, with a heroic 49,922.



Oatzy.


* All figures correct as of 2nd October 2010. Subject to variation over time.

Friday, October 01, 2010

More Twitter Insights

Introductions

First of all, following a few recent new follows, here's the revised introductions graph.
[Click to enlarge. Interactive version here.]

Main new thing going on is that some of the chains are getting longer. Also I put in some of the people I missed out last time (for clarity), just so it's more complete.


Gender

Recently, I thought I noticed that a lot of my timeline was filled with tweets by women-folk. Obvious first conclusion being that I must now be following more woman than men.

Having looked into this matter further, I actually found that I'm following almost equal numbers of men (15) and women (14) - excluding celebrities and dead-accounts.

Next question then is - What the hell?

Okay, this next theory may sound a little sexist but bear with me - what if it's just that the stereotypes are true and women really do talk more (on average)?

Given that a recent update to the Twitter API borked my previously mentioned programs and I don't (yet) know how to fix them, I had to get numbers and such by hand. Which made life a little harder.

Individual tweet rates are only approximate, and based on an average over the last ~14days. And I'll be honest, I don't entirely trust the source numbers. Also, I couldn't get numbers for some people, which isn't ideal.

NB/ For more applicable results, it would've made sense to only count tweets that would have shown up in my timeline - i.e. excluding tweets @ people I don't follow. But working that out would require a lot more effort, and frankly I don't care enough about that degree of correctness to bother.

Anyway, here are the results:
Men:
Average = 5.2 tweets/day
Standard Deviation = 4.29

Women:
Average = 6.6 tweets/day
Standard Deviation = 4.81

So yeah. The women tweet about 27% more than men. They also have a slightly greater spread of rates.

Basically, the women that tweet the most, tweet more than the men who tweet the most.  Which pushes the average and spread up.

Again, these results are only approximate and only apply to my personal network, and could vary (significantly) over time. Also if we had the missing data, it could turn out the opposite is actually true.

In fact, someone already did research on this matter for a random sample of 300,000 Twitter users. What they found was that, while there are slightly more women (55%) on twitter than men,
We found that an average man is almost twice more likely to follow another man than a woman. Similarly, an average woman is 25% more likely to follow a man than a woman.
And that, on average, men and women tweet at about the same rate.

Maybe I'm just following particularly talkative women/quiet men...


Location

Just because I can.

These are maps of people who follow me, rather than just people I follow (which I would've prefered). But I couldn't find something that could do that, so gave up and settled on this. Interactive version and make your own here.

World View
UK View
[Click to enlarge.]


Bios

For the people in the graph at the top of the page, I took their bios and made this word cloud [click to enlarge. Interactive here.]:
This is the company I keep - geeks and writers.


And One More Thing

Previously mentioned Malcolm Gladwell wrote an article for The New Yorker recently, about activism in social networks - Twitter in particular - called "Why the Revolution Will Not Be Tweeted".

In it, he explains how and why social-media based 'activism', is quite different from and less effective than real-world activism - the sort of activism that brings about genuine change. Rather, social-media activism is good at getting lots of people to participate, but they (mostly) only do so with the least amount of effort.

So, for example, they might join a FB group, sign an online petition, or even do the sort of crap 4chan pulls. But they tend not to actually go out and protest, where genuine commitment is needed and where there's the risk of say physical harm. And the resulting pay-off is much less significant as a result.
"This is because, Gladwell says, online networks are all about weak ties — a weak tie is a friend of a friend, or a casual acquaintance — whereas real activism depends on strong ties, or those people you know and trust"

In response, Jonah Leher - writing for Wired - argues that Gladwell's dismissal of weak ties in social activism may be a little short sighted. And in particular it can be necessary for a leader of a cause to have lots of weak ties, so as to have greater reach - or at least, this seems to be the case in real world situations.

Gladwell also talks about how a hierarchically structured group is more effective in activism than a decentralized-network structured group - as is often the form online groups take. And again, this is required for greater levels of discipline, control and commitment.

Both make good arguments, and both articles are worth reading.


Oatzy.

Wednesday, September 29, 2010

The Internet Vs. Anti-Piracy

Okay, so this started a few weeks back, when anti-piracy firm AiPlex Software were hire to have torrents of some Bollywood films taken down from indexing/tracking sites (including The Pirate Bay) - which don't host infringing files, but provide links to them.

They made their requests, and when the sites in question didn't comply, AiPlex went a step further than any other anti-piracy company had, and DDoSed the sites (temporarily).

Now setting aside for a second the fact that DDoSing is illegal, this was not the wisest move they could've made.


Anonymous
"This is unacceptable to Anonymous. The time has come to show these fuckers that we will not tolerate this."

"Operation Payback is a Bitch" was devised as a means of retalitation by Anonymous, in close association with infamous image-board site, 4chan (among others).

Anonymous are an interesting group. As internet activists, they may well stand for justice, but their definition of justice isn't always in line with, say, the law's - it's very vigilantist. The important thing to keep in mind though, is that Anonymous is akin to a flash mob, and therefore any given 'mob' is going to vary in membership, and even overall ideology, etc.

But in general, Anonymous will act in favour of whatever gives them lulz, or makes a lamentable person suffer.

This can be seen in their attacks on Scientology, on Sarah Palin, their tracking down of Dusty the cat's tormentor and cat-bin-lady or, perhaps most unsettling of all, their tormenting of an 11 year old girl (who, to be fair, probably had it coming). To name but a few.

Long story short, anyone who knows anything about 4chan and Anonymous will know that no sane person would ever risk incurring their wrath.


DDoS

Denial of Service occurs when a server is overloaded with requests. When this happens, the server in question will either slow to a crawl, and eventually become unavailable to new requests. Or, the server will reset itself. In cases where the site is hosted on the same servers a company's email, backups, assorted storage, etc. this can be quite inconveniencing.

NB/ Facebook's recent outage was an example of a self-induced denial of service.

A Distributed Denial of Service attack (DDoS) occurs when a group of individuals use a program - such as the Low Orbit Ion Cannon (LOIC) - to make hundreds, or even thousands of requests per second.

In the case of the Australian anti-copyright firm AFACT, their site was hosted on a cluster server, so when that site was taken out, all (supposedly) 8,000 other sites on the cluster went down as well, including small business and government websites.

In general, DDoSing is generally more useful in making a statement, since the result is usually only temporary. But sometimes.. well, we'll get to that...

Again, this was simply overwhelming a server by flooding it with requests. This was done mostly by Script Kiddies running LOICs - which is as simple as entering the target IP and hitting "IMMA CHARGIN MAH LAZER".

To call this hacking is being far too generous; and from a legal point of view, inaccurate.


Operation Payback

A call to arms was posted on 4chan, and various other sites, demanding retaliation.

This started with revenge upon AiPlex, along with MPAA and RIAA. Then they went after notorious UK anti-piracy law firm and all-round bastards, ACS:Law, who have been guilty of sending out vast numbers of 'menacing' letters demanding money for copyright infringement.

The first attack on them took the site down for a couple of hours.

Head of the company, Andrew Crossley, made the remark:
"It was only down for a few hours. I have far more concern over the fact of my train turning up 10 minutes late or having to queue for a coffee than them wasting my time with this sort of rubbish."
Now I don't know about you, but antagonising your attackers seems like a poor move. Anonymous attacked again, this time knocking out the server and causing it to reset.

But here's what happened this time:
“Their site came back online [after the DDoS attack] – and on their frontpage was accidentally a backup file of the whole website (default directory listing, their site was empty), including emails and passwords,”
Basically, instead of showing the homepage (as it should have), it showed the file directory, on which was found a zip file containing all the emails, etc., all unencrypted.

Unsurprisingly, the backups were then put of The Pirate Bay, where they've been shared by lots and lots of people, out for revenge.

As of this time, there have been no reports of victim's sensitive data being used maliciously. Rather, downloaders are more interested in destroying ACS. And, in fact, some of the people who have downloaded the emails have tried to contact and alert the victims whose personal information has been exposed.

Oh, and attacks on various other anti-piracy websites are still on going. See here for more information or to participate.


Data Protection Breach


First of all, there's the personal information that's been leaked, including that of thousands of Bskyb customers. This is the part of the story being reported by various outlets.

And if anything, it's that which will get ACS in trouble, since this is a fairly major breach of the Data Protection Act - given that the data was unencrypted and was revealed by a clumsy server, rather than hacking (and PI acknowledge this wasn't hacking).

Word is, the punishment will be around £500,000 worth of fine, plus disciplinary action from the Solicitor Regulation Authority - and this wouldn't be the first time they'd faced disciplinary action from the SRA. So even if the company isn't completely destroyed by all this, they'll still have to pay up more than twice their takings from 'infringers' (~£220,000) - and Crossley might just have to give up that Jeep Compass 2.4CVT he bragged about.

Which is nice. But there was other interesting stuff in there.


Money

It is demonstrably true that ACS cares more about making money for itself than enforcing copyright or protecting artists.

The emails show that ACS were taking approximately 50% of the money retrieved from those accused. And in fact, only about 30% of the money retrieved went to the copyright owners. Which, to me, seems a bit off, in terms of fighting for the rights owners.

No. ACS:Law have jumped on copyright infringement as a way to make a quick buck for themselves, and frankly they deserve everything they get.

A lot of this has been classed as "legal blackmail". We see letters and emails from a vast number of people who, quite obviously, have been wrongfully accused - including old people who are very confused by the claims - that are still paying up because they don't want to be taken to court. And in a lot of the cases the victims having to ask if they can pay in installments, since they can't afford to pay the lump sum.

NB/ Consumer group Which?, local councils and judges (amongst others) have reported receiving large numbers of complaints from people who have been harassed by ACS.

From what I understand, the victims are being accused of infringing individual (porn) movies or individual songs, and in each case ACS are offering a settlement payment of £495 - just below the "psychological barrier" of £500.

This being the 'claimed' damages resulting from sharing one movie - therefore implying that each infringer shared the move with, on average, ~49 people. How do they justify this figure? They don't, and indeed can't (see below).

£495 is also low enough that it wouldn't be worth an accused individual disputing the claim in court, given the legal fees would be much higher than that amount - supposedly around £10,000.


IPs and ISPs

In terms of tracking down file-sharers, ACS pays 'monitoring companies', who will find by various methods, what IP address is sharing a given file, and at what time. ACS then contacts ISPs asking for the physical addresses (supposedly) linked with the 'infringing' IPs.

The main ISP that's gotten very upset by the email leak is BskyB, though there are others, including BT and PlusNet.

ISPs Virgin Media and TalkTalk, on the other hand, have both refused to give out such information.

To quote a commenter on Slashdot:
"ACS:Law were using Norwich Pharmacal civil orders against the ISPs; they basically demand information relevant to a future court case from a third party, in this case the ISP. Sky broadband chose not to contest these court orders, and just supinely handed over the data. Nor did they notify their subscribers that such an order was taking place, so they could fight it if they chose.

In fact, ACS:Law were combining these requests into huge tranches of data - one such recent was 25,000 BT Broadband IP addresses, expected to ID 15,000 subscribers.

Virgin and Talk Talk refused to go along with these orders without a fight - potentially forcing ACS:Law to do a Norwich Pharmacal order per individual IP, which would be ruinously expensive - so the leaked emails reveal that ACS:Law specifically did not target them."

If only other ISPs had the balls to say no as well...


The Revelation

If you'd seen the numbers, you'd be wondering why ACS don't persue those who dispute the infringement claims, or just out right ignore them. In fact, only 30% of those sent letters settle and pay the £495.

The reason - ACS are aware of how flimsy their case would be, and how easy it would be to contest the claim of infringement.

So their problem is two-fold. First, an IP address isn't always very useful. First of all, IP addresses can be spoofed, or a smart user can hide behind a proxy - both meaning the IP address obtain will not match the one at the physical address of the infringer.

Or conversely, if the owner of a given IP address uses an unsecured wi-fi connection, for example, then again someone's going to get wrongfully accused.

But secondly - and this really is interesting - in one email, Crossley's own legal adviser says the following:
"establishing damages beyond the value of the gross profit of one copy of the work is problematic."

Basically, because of the way the way the monitoring works, they can only prove that the 'infringer' was downloading a copy of the infringing file to their computer.

In other words - while they can assume a user was 'sharing' the file, they can't prove prove it. And they certainly can determine the number of people (if any) the user in question was 'sharing' with.

A law firm could (and do) demand greater damages. But if an infringer could prove, for example, that they 'leached' the file - that is, downloading without uploading (sharing) to other users - then the only damages they can claim is for one copy of the file.

To Quote Ars Technica*:
"under UK law, damages are fixed at "economic loss, either realised or potential." When it comes to music tracks, the loss equals "the approximate market value of the track as a single download—79p.""

They would have been fools to take anyone to court, knowing that a defendant could potentially pull that defense, considering how the legal fees would far out-weight that pay-off. And on top of that, if they lose one case, they'd no longer be able to demand £495 - the end of their little extortion business.


Warning

Don't think this renders pirating essentially risk-free. It doesn't. Especially not if you're in America.

Since, in America, copyright holders don't need to prove actual economic harm to demand (outrageously large) statutory damages, the RIAA and MPAA are still milking (alleged) infringers for every last penny they have - and then some.

NB/ The RIAA even tried to sue a deceased woman, who didn't even own a computer while alive(!).

But if you are ever accused of infringement and believe it to be wrongful, tell them so. If you need more advice, visit Being Threatened.

If, on the other hand, the accusations are true, use your best judgement. If you think the settlement is just, pay it. If not, you can either try ignoring it, or seek legal advise in so far as disputing the amount demanded.

And in fact, if you're going to pay up the value of the files you downloaded, you may as well do so by buying physical copies, rather than giving the likes of ACS ~50% of that money.

But if you are summoned to court, you'd better show up AND get yourself a good solicitor, or you could find yourself paying up to the thousands.

Or, you know, you could just not do it in the first place..


Oatzy.


* This quote and lots more details about what bastards ACS:Law are can be found here.