Today I had a quick look at the Many Eyes project. It looks very interesting.
You can make your own visualization using their nice collection of data sets.
Look at this example:
Permanently under development:
Useful data sources
Harvard IQSS Dataverse
All source of data including Mostly Harmless Econometrics
As you might be aware, webmail at fas is not forwarding properly. It omits emails once a while. On the other side, it is still the best solution due to the speed, spamfilter and overall integration, but it can be enhanced by POP3. Following is my solution.
Log into webmail.fas.harvard.edu and in Settings go to mail forwarding. Insert your gmail address.
In your gmail mailbox, you can filter messages that were sent to your harvard address (“Filter messages like this”). You can skip Inbox, you can apply label “harvard,” do as you wish. Then in Settings go to Accounts and Imports, Check mail from other accounts (using POP3), and add new account. It almost automatically sets up everything. I download everything to my gmail account (and use IMAP it to my computer using Mozilla Thunderbird with GPG).
You get your emails immediately by forwarding, but if Harvard’s webmail forgets to send it (at least twice a month in my case), you get the email anyway.
When we have a task that would take a long time, we can usually think about parallelization. In this post I will show how to deal with an issue when you have large shared data set (but not that big so you would need MapReduce).
Let’s first start with how to set up cluster in R:
Cluster set-up using doSNOW
Revolution Analytics pulled out doMC; therefore, I am using doSNOW.
numberofcores <- 4
foreach (ind=1:1000) %dopar% foo_with(bigdata)
There are two issues here. This code gives us an error message that the function foo_with and you are transporting a lot of data what causes slow down.
Solution for both problems
Push data into your cluster by:
Function can be either pushed by clusterExport or we can use clusterApply or clusterApplyLB
clusterApplyLB(cl, array, foo_with_rewritten,...)
This blog post shows the solution in between simple SNOW (or different) cluster computing just MC or similar and cluster that needs MapReduce.
This post will be constantly being updated. I will show that you can use Git and Dropbox for collaboration.
If you don’t have Dropbox account, please you this link and we both get additional 250MB: http://db.tt/4ZKL2z3
Set up a main repository in one directory in Dropbox. Then clone repository for every person working on a project into their own directories (still in Dropbox).
Once you have this set up, fetch from main project into clones, and once you want to consolidate, push it to the main one. It would take some time and you would most likely need kdiff3 or something similar.
What to do if you get “Does Not Appear to be a Git Repository” on windows:
Go into your .git directory and change the location entry in config file to ‘c:/’ instead of ‘/cygdrive/c/…’
Add remote to your “main” version.
I don’t like this one, but it is there: