Diary for a year – an textual analysis.

This is going to be a bit of strange blog post, but I’ll see how it turns out.

As I mentioned in my last blog post, I kept a diary for a year. All of it is in text files, sitting in a folder on my hard drive, backed up on various hard drives and in the cloud.

So now what do I do with it?

Well, it’s been handy to look up names of people I’ve met, or places I’ve been, but as time passes that will be less useful.

In 10 years time I could read through the whole thing, and see how much of a dick I was, but there’s no way I’m going to read through the whole thing now.

But I want to see how much I can learn about my life when I was aged 30 years old. So here goes!

First step: combine all text files into one. I’ve done that already, using Automator on OSX. It’s handy for stuff like this.

Step two: write a python script that filters out all punctuation, line breaks, tab breaks and spaces.

This leaves me with a huge list of over 200,000 words.

Step three: modify script so it counts up how many times I’ve used each word.


Total number of unique words in the diary: 9,608. Is that a lot? I guess it’s a pretty varied vocabulary.

The top 10 most common words:

10449	 i
9310	 the
7244	 and
7115	 to
5633	 a
2871	 it
2705	 in
2573	 of
2327	 my
2280	 but


Step four: import into a spreadsheet where I can scroll through the words and tag each one as either a Name, a Place, a kind of Food, an Action or an Object. The vast majority of words are none of these, of course.

This is more time consuming, of course. I decided to ignore all words I only used once or twice each, as they make up about two thirds of the 9,608 words. And I’m just not clever enough at python scripting to do anything like this automagically (and certainly not while unconnected from the internet) so I tagged each word by hand.

The results?

Let’s start with food-related words. I’ll share the top 24.

267	 food
227	 breakfast
126	 dinner
97	 pizza
47	 burger
47	 eat
44	 drinks
42	 lunch
34	 eating
31	 tea
30	 drinking
25	 shots
21	 pasta
21	 tasty
18	 cake
17	 mustafas
16	 cheese
16	 chocolate
16	 hungry
16	 milk
16	 yummy
13	 coffee
11	 crepe
10	 parliamento

I think this is quite educational. I mention “pizza” more times than I mention “lunch”. This doesn’t mean I ate pizza more times than I ate lunch, but I guess pizza is more important for me to record in my diary than one meal of the day.

Burger probably ranks so high because of the 8 Bacon Cheeseburgers in 8 Days project I undertook last September. Since then I’ve eat more burgers than I normally would do in a year, mainly to see if I can find a tastier burger. And, of course, with such a project in mind I’ll write about it in my diary more often.

Mustafa’s Hänchen Gemuse Kebab? The best kebab place in Berlin which happens to be right on my street? 17 visits in the last year, I’m guessing. And 10 trips to Pizza Parliamento, my favorite pizza restaurant near my apartment.

“Tasty” and “yummy” pop up more than I would have thought. I guess “yummy” is a word I’d use more in a diary than normal conversation.

Next set of results:


139	 berlin
127	 park
122	 bar
71	 apartment
71	 hotel
65	 ejc
61	 hot-tub
52	 airport
47	 gym
41	 prinsendam
40	 london
39	 bookshop
37	 cabin
32	 hill
27	 cafe
26	 boat
25	 theatre
24	 bank
24	 ubahn
21	 port

Berlin wins, of course. But there’s a lot to learn about me here. “Park” means Victoria Park in Berlin, where I go to juggle every day when the weather is good.

“Bar” is self explanatory, right?

“Hot-tub”? When the weather is good in Berlin I go hang out in the park. When the weather is good while I’m on a cruise ship, and even when it isn’t, I usually spend an hour per day in the hot-tup and pool. On a sea day I hang out while the sun sets, otherwise I hang out while we sail out of the port.

“EJC” isn’t just a place, but an event, which I mention throughout the year as I was part of the organizing team.

“Gym” in NO WAY means a place where I get fit. Instead it means the gymnasiums at juggling conventions.

The “Prinsendam” is a ship that I perform on six or seven times a year. And other words like “airport”, “cabin, “boat”, “hotel”, “ubahn”, and “port” just show how much travel is a big part of my life.

Next results?

Activities/Verbs. A top 20:

772	 went
568	 show
565	 think
441	 work
427	 juggling
385	 going
265	 chatted
265	 said
259	 sleep
244	 make
239	 played
193	 guess
160	 chatting
157	 met
152	 tried
146	 ate
146	 feel
146	 remember
141	 play
139	 found

This seems pretty standard, I guess. And saying “I guess” might explain why I do so much guessing.

Looking further down the list, I notice “116 sex”. I know for a fact I didn’t have sex 116 times!

And then “97 shower”. I know for a fact I had a shower more than 97 times!

“Combat” is mentioned 90 times. And “juggle” (as opposed to “juggling”) another 83 times, and “juggle” 80 times.

Way down the list is “uploaded” at 47 mentions, but that’s high above “downloaded” at 23 mentions. I guess this shows that uploading new content like podcasts and photography is more important. Or something.

Strangely “photography” only gets 68 mentions. I thought this would be higher, but it’s just down to word choice, I guess. That brings me on to the next set of results…

Things, objects, nouns, etc. The top 20:

375	 bed
298	 photos
267	 food
222	 room
214	 video
205	 ship
197	 music
183	 internet
183	 song
168	 club
152	 book
143	 podcast
139	 stage
130	 head
129	 shows
110	 camera
109	 game
109	 songs
100	 guitar
99	 facebook

See? Photography is very important to me. So is music and performing, and reading, and my online life.

I’m not sure what else I need to mention about this list of words.

And on to the final set…


This time, to be a bit more inclusive, I’ll list the top 30.

279	 Julianne
242	 kim-nga
104	 luke
69	 kissha
68	 pola
62	 daniel
54	 declan
49	 olga
48	 eva
44	 karo
43	 alex
39	 doreen
37	 flo
35	 jeff
33	 dj
33	 nathan
33	 scott
31	 rym
29	 john
28	 billy
28	 kyle
26	 christine
26	 jesse
26	 tim
25	 jochen
24	 david
23	 nat
22	 corinna
22	 jessica
22	 jj
22	 lee

And let’s just start at the top. “Juliane” is, of course, my current girlfriend. I met her for the first time at the start of June, so she wins by quite a number of mentions in under three months worth of diary.

Second place is “Kim-Nga” who was my girlfriend last year. We were together from October to early January, so about three months again. Though “together” is funny word for a long distance relationship.

Third place is “Luke” which is me. This is because I addressed many diary entries to my future self, saying “Hey Future Luke, reading back over this diary, here’s what you did today. This is reflected in the song I wrote last September called Future Luke.

Kissha is friend in Berlin who I kinda dated in the spring. Pola is my ex-ex-girlfriend who still pops up in my life quite often. But in a good way, as we are still friends.

And then as I look down I see friends I hang out with in Berlin, people I’ve spent time with on cruise ships, people who have stayed at my place, people I’ve been to juggling conventions with, and people I met last year in New York.

Some people don’t feature much in my diary though, even though they feature quite large in my life. I’m not sure why I didn’t mention them more.

This includes:

1. Girls I met in Berlin, with whom I hoped to begin some kind of relationship, but for some reason it didn’t work out. So I’d think about them quite a bit, and mention them every now and then in my diary, but wouldn’t make it in every time I thought about them, only when I met them, or planned to meet them.

2. People I spent just a few days with on a single trip, and might have changed the direction of my life in a big way, but following that I didn’t meet them again.

3. People whom I chat with on an almost daily basis online, who are just part of my every day life but I don’t “do” anything with them worth writing about in my diary.

And then some people on this list are there for negative reasons. “Lee” was a very annoying guest entertainer I had the displeasure of spending three weeks with on a cruise in the spring. In fact, I didn’t spend much time with him, I actively avoided him, but the other entertainers kept getting annoyed with him, and all I heard from them were complaints.

Actually, I think Lee is the only negative placement in the above list.

Finally, in the 365 days I was aged 30, I had sex with 5 girls. I’ll not say who they were, but I’m glad they all made the top 30 above.

That’s it! This is such a weirdly abstract way to analyze ones life, I’m not sure if it is helpful or unhelpful. I don’t think there’s anything else I need to share about my life for a while.

Last note:

The longest “word” in the diary came out as “long-distance-non-dating-friend-with-no-benefits”. This is a specially invented term for Robyn!

This entry was posted in Home, Juggling, Life, Meta, Philosophy, Photography, plans and goals, Podcasts, Travel, Videos, Writing. Bookmark the permalink.

Leave a Reply