3/10/13

the nar-nar saga continues...

The database rose from the grave, thanks to mongoHQ.. so now I need to do something with it..

here's where I'm at:
- I have the database of deleted items
- I have csv dumps of previous spreadsheet inputs
- I still don't have quite all the profileKeys I need

here's what I want:
- currently I'm storing a history of actions inside each records.. I want instead a separate history table, since we're more interested in the history of actions taken by a user, rather than the history of actions taken on a record — and I need a place to store actions like "grabbed a batch of records".. the main motivation for this is that I'm going to be giving someone else direct access to the database, and I want it to be more comprehensible for them (the more comprehensible it is for them, the less they need to get help from me).

step 1: start storing new actions in a new table/collection.. done
step 2: backup database.. done
step 3: remove the "history" attribute from all the records.. done
step 4: remove remove img, name, new_id, overview, title
    note: I'll want to remove obo and username, since those can be reconstructed from _id now
step 5: set obo and username in the same function that fetches title and overview from oDesk.. done
step 6: remove username and obo from records..
    oops, before I do that, let's make the uploader not add these fields either.. well.. let's let the updater run first..
hum-de-dum.. it's running.. waiting.. ok, it finished..
so now I need to re-remove all the stuff I don't want: img, name, new_id, overview, title, username, obo.. done
..and I now need to check-in the uploader not adding these fields.. done.. I just need to remember to check that that works the next time the uploader is executed..

step 7: merge mongoHQ's backup into my recent backup, marking the recovered items as recovered=true.. upload recent backup to "narnarRecent".. good, now upload the HQ backup into "narnarHQ".. good.. now write script to copy items from narnarHQ which don't already appear in narnarRecent and put them there, adding the "recovered" flag..
..oops, forgot to set recovered flag.. let's kill narnarRecent and restore it again.. done..
ok, the script is running..
..this should put me in a position where I have just one backup database to do stuff from,
and there are a couple things I want to do from it..
- restore the recovered items into heroku's database
- re-create the histories into the new history system..
- check the spreadsheets to see if there are items that never even made it into nar-nar because of failed inserts..

bwahahaha! the script was taking a while checking for the existence of items, and on-the-fly, I added an index, and then suddenly I get about 100-fold speedup :)


ok. I also want to put the _on_ice.txt items into narnarRecent.. done

now let's see which items from the spreadsheets are not in narnarRecent.. hm.. a lot it seems.. but they're old.. I think they're before the start of nar-nar..

the oldest item in nar-nar is 1361491264000 (Feb 21st).. let's see if there are any items after that time.. nope.. super

so now all that remains is the following:
- recover missing records from narnarRecent into heroku
- recreate history from narnarRecent into heroku

to do this, I'm just waiting on the profileKeys, which should come when the data warehouse updates.. I'm not sure how often it does that though..

I guess I don't need the profile keys for the history.. though I sortof do, since I need to recreate histories for some records which don't exist, though I suppose I can do that fine, since the history just shows what the record looked like when the user did stuff to it, and I have that information..

though I don't really have that information — I just have what the record looked like when it was first created.. hm..

I feel like the old history is impure.. hm.. what is it's purpose.. it's purpose is supposed to be checking the quality of people making assessments..

hm.. hm.. I'll ask about the history. I don't think it will actually get used, so maybe I can just provide an archival dump of what I currently have.

No comments:

Post a Comment