Has the self-tracking movement gone too far? Indiana University released a smartphone app that allows users to report their sexual behavior.   The idea is that this data could be aggregated for research:  they plan to make the anonymized data that they collect available to researchers:

Here’s the link to the story.

I’m sure you all rushed to sign up, right?  Who in their right mind would knowingly report about their sexual activity, and then trust it to some grad students to keep it anonymous?

From the research side, we have to be very careful analyzing such data.  Clearly there is a bias in the type of person that will download such an app and report their behavior.  This was a fatal flaw of the original Kinsey Report – the groundbreaking large scale sex survey done in the 1950s.  Kinsey  gave his sex survey to anyone who he thought would be likely to do it – he ended up with lots of college undergraduates and prison inmates (really!) – clearly *not* a representative sample.   Kinsey’s well reported estimate that 10% of adults are homosexual was frought with error and eventually widely discredited.  (most experts agree the number is between 1 and 4 percent, depending on the definition).   I dont know how the makers of this app plan to get around such self-selection biases.

But who cares about what other people are doing in bed?  Well, lots of people do.  For one, it’s very interesting!  Check out this site analyzing the sex practices of 1000 representatively selected Brits.   Go ahead…we’ll wait for you to come back, even if it takes some time.   Fascinating, no?   And what’s the first thing you do on a site like that?  You probably compared your own personal habits, numbers, practices to that collection of people.  We all want to know how ‘normal’ we are, right?  I think that is a positive development — there may be people out there who think they are unusual, or deviant in some way.  Data like this makes it clear that no matter how freaky you think you are, there are likely to be lots of other ‘normal’ people out there just like you.  That can be powerful – especially for adolescents who are struggling with their sexuality and concerned that they might not be ‘normal’.

So although it might seem crazy to collect this kind of data, and even crazier to volunteer to provide this kind of data – I think it is *exactly* the type of data that is going to make the data revolution so powerful.  When people start providing detailed personal experiences about their bodies, their diseases, their symptoms, their reactions to drugs, their health care experience, and yes, their sex lives, we can learn so much – and so much more efficiently – than via an expensive NIH funded clinical trial.    But, we do have to watch out for those pesky biases (which is why we statisticians will still have jobs :).

So, I hope I have convinced you to do your part. If so,  you too can download the sex tracking app from

Data collecting has never been this much fun.


Fruits of My Labor


I’ve learned a lot doing this project already, about myself, about self-tracking in general, about health.  But of course one of the reasons I started was not just to learn, it was to get healthier.  And so it is nice to have something concrete to hang onto, a tangible result that shows that all of this tracking and learning is worthwhile.  My blood test results were a nice start.   But tonight I had one of my best moments of my year of data:   I ran a 5k in 25:40, faster than I have run in the last 10 years.

That.  Feels.  Good.

You may remember when I blogged about my 5k times – here is an updated graph with the new point.  Sweet.

It was a perfect night for running, and a totally flat course, so that helps.  Nonetheless, I was surprised.  I haven’t been running much, but I have been cycling : in preparation for my racing in the NJ Breve Fondo – I’ve been riding 15-20 miles on the weekends.  I am in pretty good shape in general – here is my weight graph since the project began:

Weight graph – made using

It’s a nice, steady, month to month decrease.   Not too fast, but just steady.  A bit of a lapse during my July vacation, but was able to recover from that fairly quickly.

I’m now 3/4 of the way through my year and I feel better than ever.  Hope I can keep up the momentum!

Eat, Drink, Data

1 Comment

Yesterday I posted about the improvement that I saw in the past 6 months in my cholesterol levels and my bloodwork.    So how did I do it?   Overall, I basically made a commitment to eat better and exercise more.  None of the changes I did were earth-shattering, but a lot of little changes adds up.  I heard once that we make 200 decisions a day regarding what foods we eat.  I dont have to change all of those decisions, but if I could change a few key ones, hopefully it would help.

I did some research.  To lower triglycerides, eat lots of fiber, fruits and veggies, and fatty fish (omega-3). For lowering overall cholesterol, go low carb.  Fats are tricky: some say to avoid fats altogether, others say fats should be limited but are ok if they are unsaturated, yet others say that you should bulk up on good fats (like olive oil) as much as possible to drive down the caloric % of carbs.  How can I know who to believe!?!

I am a big fan of Michael Pollan, and his manifesto:  “Eat food.  Not too much.  Mostly plants.”   Can’t argue with that one.  So in that spirit,  I rolled my own plan.  It is based on  a few rules that I dont follow religiously, but try to keep in mind as I am making my 200 decisions every day….

  • Take one fish oil pill daily (one per day) – Omega 3s impact triglicerides
  • Try and eat more fish in general, especially fatty fish like salmon.
  • High fiber cereal every morning
  • Salad bar every day for lunch at work.   I found lunch was the easiest meal to really change my habits and make a difference.  I’d often eat a big sandwich or 2 slices of pizza or the burrito special at lunch just cause it looked good.  After lunch I would often feel bloated and full and then have a crash around 3pm.  Eating a salad, with lots of veggies, some chopped chicken or ham and a little dressing,  is yummy, fulfilling, and I generally feel better thru the day.
  • lots of leafy greens on the salad.  Spinach, not iceberg.
  • Cut back on sweets – note that I didn’t cut them out, just cut back.  Smaller portions, less portions, and smarter choices.  Sometimes just a little dark chocolate can satisfy my sweet tooth, which brings me to…
  • Cut back on snacking: that pre-bedtime cookie snack binge can add hundreds of calories and fat when you need them least.   A handful of cashews and dried fruit can suffice if I am really hungering.
  • Less cheese.  I used to eat a lot of cheese, and would gravitate towards meals in restaurants smothered in it.  No more.
  • Less fried food.  Fried food creep is everywhere.   Does sushi really need to be deep fried?  I usually ask for a side salad to replace fries on my meal — especially since I can steal a few fries from my kids’ plates and dont need a half a plate full of em.
  • Do a detox plan 2-3 times a year

and last but not least:

  • More dark chocolate and red wine.  The jury is out on whether these are good for your heart, but some studies have shown it is true, and that is what I choose to believe 🙂

I’m kind of making it up as I go along.  I have self imposed rules – for instance, I am much more stringent during the work week than on the weekend.  I dont fret over what I eat for dinner usually, but focus more on portion size and not taking a second (or third) plateful.   Anyway, it has worked for me, and I really dont feel like I am denying myself of very much food pleasure (and I really *do* love to eat!).

Of course, this may all be a red herring.  My doctor told me when I got my first blood test that no one should ever get their blood tested during the holidays, because we all eat like crap from Thanksgiving until the end of the year.    That first blood test was on December 21.  He also warned me that even though I had fasted for 12 hours before the test, if I had eaten a cheeseburger as my last meal it would affect the test.  That was exactly what I had eaten (how did he know?).

For the second test, I fasted for longer and ate really well the day before.  So who knows whether the improvement was due to this ‘gaming’ of the test, or really due to my diet/exercise regime?  Guess we wont know until the next data point!

Statistics of the Heart


Followers of this blog may remember back earlier this year, when I posted about my detailed bloodwork I even posted a scan of my bloodwork, with all of these fancy blood counts that I had never heard of before.

There were a few scary numbers on that chart.  In particular, there were two numbers that made me really worry.   My trigliceride count was 209, well within the high risk period.  And something I had never heard of, LDL particle count, LDL-P, was at 2164.   I didnt know what this meant, but it was pretty clear that it was bad, since the low risk category was < 1000,  and it only took a score of 1600 to be high risk.  And I was at 2164!

Some friendly commenters after that post pointed me to a great series of posts on cholesterol measures and coronary risk (check out the 9 part series here when you have time !).  That detailed set of posts argues that of all the numbers on that sheet, it is LDL-P which is the most predictive risk factor.  It was even worse.  I thought it was good that my LDL-C was reasonably low.  But the blog post pointed out that made me discordant, and people with profiles like me, with high LDL-P and low LDL-C, are actually at the highest risk for heart disease!

Well, I was already motivated to eat better and exercise more because of my project.  But this solidified it.  I figured I would try some lifestyle changes and measure blood again in 6 months.   My doctor was suspicious and wanted to put me on statins, but I made him a bet…if there was no improvement in 6 months, I’d do it.   Just like this entire project, stating my goals out loud, and putting someting on the line helps me to stick to the program.

Anyway, the results are in…drumroll please….and I have some decent improvement!  yay!  Still got to work on this, but here are the main numbers:

  • Total Cholesterol  : Down from 199 => 190
  • Triglycerides : Down from 209 => 105  (High risk to low risk!)
  • HDL-C : Up from 29 => 37 (that’s good!)
  • VLDL : Down from 42 => 21 (high risk to low risk!)
  • LDL-C : 139 => 140 (still in a good range)

and the newfangled ones

  • LDL-P : 2164 => 1791 (still high risk, but big improvement)
  • small LDL-P : 1855 => 830 (gets me just out of the high risk category, but not by much)
  • LP[a] 39.7 => 47.6 (this is the one bad change that I dont fully understand)

I got to these changes by making some changes in eating, exercise, and general lifestyle.  My next post will be about what these changes are, and what my next steps are.  Nothing incredibly drastic, but definitely something that takes diligence, and that I hope to continually improve on.  My father had coronary problems, starting in his late 50s, so this is something I really want to stay on top of.

Here’s the blood numbers for those who know what they mean:

Getting old, 5K at a time


Jogging is my activity of choice to stay fit.  I was on my high school track team, and generally have tried to keep up with it in my adult life.  I’m a fair weather runner – I’m not one of those dedicated souls who are out running in the freezing rain of a NJ February, but in the spring-summer-fall I try to keep active with it.

Like with everything else, it is good to have a focal point for training, and for me, I try and run a few 5k races each year.  They help keep me honest with my training, and give me goals to try and reach.

Races fit in nicely to my project, because they record your time, and post your performance publicly on a web page (just like I am doing).  I have not kept good records over the past decade of my running times, but I was able to do a little online digging to see what races I could find.  Here is a chart of the finishing times of 5k races I was able to uncover:

5k race times for 1999-2012

There are a few interesting things to notice about this plot.  First is the dead area for about 5 years between 2002-2007.  Not coincidentally, this corresponds to my first five years as a dad.  Not that I did not continue to run during this time, but I did it far less frequently, and often with a jogging stroller.  I ran a few 5ks during this time, but a few were with other people or with the kids, and I wasnt running for speed, so I left those out.

You’ll see that my down time took its toll on my return to action in 2007.   But the last few years I have been diligent in running a few races  each year.  I am amazed at how consistent my times have been in the last four years – 9 races, at different times of year and different conditions, all within 60 seconds of one another.  I had no idea I was that predictable!

But I was really interested in the trend line, especially going back 10-12 years.  The slope of this line indicates the time increase associated with my slowing down as I age.   The slope is 0.17, meaning that on average every year my 5k time has increased by 0.17 minutes, or roughly 10.3 seconds of increase in time per year.  (techie note: line was fit using robust regression – lmr() in R – due to the sparse data and outliers).

So, is this a ‘normal’ aging curve? It turns out we have data for this!

Compuscore is a company that does the timing for 5k runs, so they have lots of data on typical times for people of different ages.  For every race, they report, in addition to your time,  what they call a “PLP” – Performance Level Percentage, to allow comparisons between men and women of different ages.  You can think of it as a percentile for your age/gender class.   I have consistently run at about a 50% PLP.  On this web site, you can look up what your PLP is for a given time, or calculate the time given a PLP.  I came up with this table, which covers men in the age range of my plot above.

Age 5k Time (50% PLP) Seconds Increase from Prev year
30 25:52
31 25:55 3
32 26:00 5
33 26:06 6
34 26:13 7
35 26:22 9
36 26:31 9
37 26:42 11
38 26:54 12
39 27:06 12
40 27:18 12

This table says that 30 year old men –  who stay at the same level of fitness with respect to their peers – have an expected increase in their total 5k time of  3 seconds per year .  By the time you get to 41 (where I am now), your time is expected to increase at 12 seconds per year!   So my 10.3 seconds per year increase over this time period is somewhat in line with what we would expect given the above chart.

I thought I would zoom in on those last 4 years, the ones where I’ve been so consistent (see below).  Despite the consistency, the trend line is still upward (that is, slower).  BUT, my regression line slope has slowed to 8.6 seconds per year.  So, at least temporarily, it seems I am keeping one small step ahead of father time.

5k race times – zoomed in on 2009-2012.

CTV Release 2.0


I’ve updated my personal data on the Dropbox site.  Click the Download Data link on the sidebar and go to the “Current” Folder.  See here for details on what data is available.  There is a README file.  The current release has files for

  • Withings (Weight tracking)
  • Runkeeper (running/biking)
  • Fitlinxx (workout)
  • Livestrong (food tracking)
  • Rescuetime (productivity)

Have at it, fellow data scientists!


The strongest force in the universe

Leave a comment

I am always amazed at the power of inertia.  This is the principle in physics that says that an object at rest tends to stay at rest, and an object in motion tends to stay in motion.  This is typically described in a physics classroom through the momentum of a moving object, or friction and resistance foiling your attempts to move that solid block sitting on the table.

In life, inertia rears its head in many ways.  Take work.  If you have ever been at work, and ‘in a groove’, and decided to work through lunch or dinner, and end up totally late for wherever you are going, that is inertia.   Conversely, if you have ever procrastinated through a morning, and looked at a clock and thought, “gee its just a half hour until lunchtime, wouldn’t want to start anything too challenging” that is the ugly other side of the inertia pillow.

Nutrition and health is the same thing – if I am eating well and exercising, regularly grabbing a salad for lunch instead of pizza, and running in the mornings, it is easy to turn down the temptation of the cookie at the Starbucks counter.  Gotta keep the healthy momentum.   But if I have had a binge-y weekend full of nachos and french fries, then why not have the chocolate lava cake for dessert?  Inertia is behind the classic ‘I’ll start my diet tomorrow’ phenomenon.

This project has been chock full of inertia.  After a gangbuster start, I lapsed, and only recently came out of it.  Here are my Runkeeper running totals since I started tracking in November:

Monthly Running Totals via

After several months of solid running – in the winter no less! – I fell off the map in March thru May.  Yeah, there were excuses – re-organization at work, two new bosses, lots of meetings, coaching Little League games – all of these were *valid* excuses, but the real culprit was inertia.  Once a week or two goes by without running, I know that morning run will be less easy.  It will be harder to get up, I’ll get tired quicker, and oh, its kind of cold out so I’ll just stay in bed!

Now that I have started again, the inertia works in the other direction.  When the early alarm goes off I want to get up and going – especially if I didnt run for a day or two, I miss it, like withdrawl.  A day without any exercise and I am drowsy and grumpy by 3PM.

Anyway, inertia hits the blogger too, since apparently I have not posted anything in months, since before my dark period.  So, hopefully this post is the breakthrough…I’ve got lots to update on…some things have worked and some have not so well, and I am due for an update on my data.  Stay tuned.


The Mother of all Self Tracking

1 Comment

I haven’t posted in a while and I am due to give my February update.  But while you are waiting for that, here is an amazing analysis of self-data, collected over 20 years, by  Stephen Wolfram, famous mathemetician and designer of Mathematica, and the Wolfram|Alpha ‘knowledge engine’.

Take a look – it’s pretty amazing:   Stephen Wolfram: The Personal Analytics of My Life

Here is a plot of all of his keystrokes, over 100 million of them (!) over 10 years.


Unfortunately, he doesnt talk about how he collected all of this.  Back in 1990 when he started collecting data, computers were very different beasts.  It is amazing that he managed to collect this data over different eras and different technologies.  Awesome.

(LDL) Particle Man


I went to the doctor last month for a checkup and told him about my project.  I mentioned that I wanted as much data as I could get about myself, so he recommended the ‘super-duper’ blood workup instead of the normal one.   Here it is (click for readable version).

My Dec 2011 Bloodwork

So it turns out I have some issues.   I’m in the red zone for Triglicerides, VLDL Cholesterol, and especially for LDL-P and small LDL-P, where my values seem to be, quite significantly, off the chart.   The standard HDL and LDL values, as well as the ratios, seemed to be in the normal range, but I failed the test for these newfangled things.  The dash-P in LDL-P stands for particle, and it measures the size and number of particulates in your LDLs.

Wow.  So I did a little research on LDL-P.  It has been well known that elevated levels of LDL, combined with low HDLs, are a risk for heart disease.  But lots of people get heart attacks with normal LDL levels.  In the last ten years, the research is showing that not all LDLs are created equal – LDLs come in different shapes and sizes.  It turns out that – counterintuively – small particle LDLs are much more harmful than large particle LDLs.  The small ones are the ones that get stuck in arterial walls, while the large ones are more soluble and float on by harmlessly.   For two people with the same LDL levels, those with smaller particles are muchmore at risk for heart disease.

Oh boy.  I’ve got a family history of heart disease, so I think I need to take this seriously.    I’m not really interested in starting myself on medication (statins) that will last the rest of my life.  So, I’m going to really focus on  my new regime of increased exercise and healthy eating to make a dent in these numbers.  Also, I apparently  need to bulk up on fiber and omega-3s (oat encrusted salmon for dinner tonight I guess!).  I suppose I’ll give it 6 months of diet and exercise changes and then go  for more bloodwork.

But in general, I’m a little concerned about over-reaction to these kind of tests.  For one, I am not an expert.  For two, these kind of medical tests and advice seem to go in fads, and change every few years (eggs are bad!  eggs are good!).  So, for those of you out there who know anything about cardiac health and medicine – please feel free to let me know your view of my test scores and seriousness of this.  I’m happy to explore crowdsourcing my health care!

CTV Release 1.0

1 Comment

When I started this project, I pointed out that in order to keep myself honest I was going to open source my data.   I’m not sure if anyone will ever look at this, but in theory making it public gives me that little extra peer pressure to keep this project going.

Well the time has come for my first release!

I have created a public Dropbox (accessible here, and  link on the sidebar) repository for anyone to access my data.  It is broken into monthly folders, and has a README file for a little more detail.   In brief, the monthly folders contain the following files:

Fitbit:  Steps, daily floors climbed, distance travelled  and calories expended – according to the Fitbit.  Also has my daily weight measurements and fat % from the Withings scale.  And BP measurements, when I do them.

Livestrong:  Contains details of my food logging, with a summary file and a (very) detailed file with the food log and the nutritional breakdown.

Fitlinxx:  This is the system at my gym tracking my fitness there (weight training).  It lists machines used and weight lifted (um, not very much weight, I’m afraid)

Runkeeper:  Logs of non-gym related exercise. Mostly running, but also skiing or anything else I do.

Rescuetime: ‘Productivity software’ that tracks the programs I am using on my computer.  A way to keep track of lollygagging.  Procrastination, thy name is Facebook.

All of my data is up for the month of January, and is complete.  There is some data for previous months, but it is a bit more spotty.

Some facts from my first month:

  • Weight loss from 1/1 to 1/31: 4.9 lbs
  • Days I remembered to set the Fitbit to track my sleep: 23
  • Avg sleep time: 6hr 26 minutes
  • Avg time to fall asleep: 8 minutes
  • Number of days I walked more than10K steps:  6
  • Avg floors climbed per day: 21.6
  • Times went to gym: 4  (sounds low, but it is about 3 times more than I usually go in a month)
  • Days I did food logging: 17
  • Days my food logging can be considered ‘complete’: 7

Rescuetime plot of my Facebook usage for January

Older Entries