work notes


I've been manually keeping track of my sleep patterns in this google doc, but as I mentioned before, I want an icon on my phone that will automatically update this spreadsheet.

And I've decided to HumanScript this... here is the script (which has been posted on oDesk):

- create public repo on github (with a default README file)
- put these instructions into the repo in humanscript/AA577/hs.txt
- create two simple icons (they can look crappy)
    - one icon represents "awake" (like a face with eyes open)
    - one icon represents "asleep" (like a face with eyes closed)
- create an android widget that behaves as follows
    - it allows the user to add a widget to their main app area
    - at first it shows the awake icon
    - when the user clicks the icon, it changes to the asleep icon
    - when the user clicks the icon again, it changes back
- test the widget
    - install the widget
    - take a screenshot showing the awake icon
    - click it
    - take a screenshot showing the asleep icon
    - add these screenshots to the repo in humanscript/AA577/
- create simple instructions for installing the app
- put these instructions in the README file
- send me a link to the github repo

- in your cover-letter, please summarize what I want.


Note that this is just a starting point. It creates a widget that I can click. It doesn't actually send any notification to a server when I click it. I plan to add that in a second HumanScript.

Machine Learning

I was getting stuff settled on an ec2 machine that I thought would be the final destination for my classifier, but things have changed a bit, and I don't think that ec2 machine will be the final resting place for my code. So I've brought my code back to my own machine, which will make hacking on it a bit easier anyway.

My goal at this point is to create an ROC curve, because I want to see how much we can reasonably expect to gain from this classifier — that is, we only want to trust things that the classifier is darn sure about, and give everything else to someone to look at. So I've trained on some data, and I ran liblinear's predict on some test data.. and I have a file with linear regression probabilities for each test item.. there is probably some utility that will convert this file into an ROC curve for me. Let's see.. well, my brief search didn't turn up anything.

Hm.. question: the file looks like this:

labels 1 0
0 0.0110771 0.988923
1 0.985343 0.0146575
0 0.0264095 0.973591

The 0 or 1 in the first column is the label, but is it the predicted label, or the real label? How could it possibly be the real label you ask? Well, because my test data has real labels in it, and liblinear didn't balk at that, which suggests it may have been expecting them, i.e., expecting labeled test data rather than new/unlabeled data.

I guess I can see if these labels ever disagree with the second and third columns (which I assume are probabilities of labels 1 and 0 respectively, and add up to 1).

..Answer: they are the predicted labels, i.e., they correspond perfectly to whether the first column is greater than 0.5.

So, this file isn't enough in itself to create an ROC curve anyway.. I need the real labels as well.

And here it is:

Hm.. not quite as good as I was hoping. But I think there's room for improvement (I've seen a better ROC curve on this same data — well, similar data — so my choice of features is probably not ideal).

But, if we went with this, then using a low probability threshold of 0.01, we get 97% recall, with a 67% false-positive-rate, which I think means that 33% of the good items would be deemed good by the system, and 3% of those would be bad. Is 3% too high? I suspect yes.. but I'm not sure. I'll need to discuss this with people to decide how much to try and improve the system.

No comments:

Post a Comment