{"id":1390,"date":"2011-08-27T21:40:52","date_gmt":"2011-08-27T20:40:52","guid":{"rendered":"https:\/\/lukeburrage.com\/blog\/?p=1390"},"modified":"2011-08-28T15:44:37","modified_gmt":"2011-08-28T14:44:37","slug":"diary-for-a-year-an-textual-analysis","status":"publish","type":"post","link":"https:\/\/lukeburrage.com\/blog\/archives\/1390","title":{"rendered":"Diary for a year &#8211; an textual analysis."},"content":{"rendered":"<p>This is going to be a bit of strange blog post, but I&#8217;ll see how it turns out.<\/p>\n<p>As I mentioned in my last blog post, I kept a diary for a year. All of it is in text files, sitting in a folder on my hard drive, backed up on various hard drives and in the cloud.<\/p>\n<p>So now what do I do with it?<\/p>\n<p>Well, it&#8217;s been handy to look up names of people I&#8217;ve met, or places I&#8217;ve been, but as time passes that will be less useful.<\/p>\n<p>In 10 years time I could read through the whole thing, and see how much of a dick I was, but there&#8217;s no way I&#8217;m going to read through the whole thing now.<\/p>\n<p>But I want to see how much I can learn about my life when I was aged 30 years old. So here goes!<\/p>\n<p>First step: combine all text files into one. I&#8217;ve done that already, using Automator on OSX. It&#8217;s handy for stuff like this.<\/p>\n<p>Step two: write a python script that filters out all punctuation, line breaks, tab breaks and spaces. <\/p>\n<p><b>This leaves me with a huge list of over 200,000 words.<\/b><\/p>\n<p>Step three: modify script so it counts up how many times I&#8217;ve used each word. <\/p>\n<p>Easy!<\/p>\n<p><b>Total number of unique words in the diary: 9,608.<\/b> Is that a lot? I guess it&#8217;s a pretty varied vocabulary.<\/p>\n<p>The top 10 most common words:<\/p>\n<pre>\r\n10449\t i\r\n9310\t the\r\n7244\t and\r\n7115\t to\r\n5633\t a\r\n2871\t it\r\n2705\t in\r\n2573\t of\r\n2327\t my\r\n2280\t but\r\n<\/pre>\n<p>Boring!<\/p>\n<p>Step four: import into a spreadsheet where I can scroll through the words and tag each one as either a Name, a Place, a kind of Food, an Action or an Object. The vast majority of words are none of these, of course.<\/p>\n<p>This is more time consuming, of course. I decided to ignore all words I only used once or twice each, as they make up about two thirds of the 9,608 words. And I&#8217;m just not clever enough at python scripting to do anything like this automagically (and certainly not while unconnected from the internet) so I tagged each word by hand. <\/p>\n<p>The results?<\/p>\n<p><b>Let&#8217;s start with food-related words.<\/b> I&#8217;ll share the top 24.<\/p>\n<pre>\r\n267\t food\r\n227\t breakfast\r\n126\t dinner\r\n97\t pizza\r\n47\t burger\r\n47\t eat\r\n44\t drinks\r\n42\t lunch\r\n34\t eating\r\n31\t tea\r\n30\t drinking\r\n25\t shots\r\n21\t pasta\r\n21\t tasty\r\n18\t cake\r\n17\t mustafas\r\n16\t cheese\r\n16\t chocolate\r\n16\t hungry\r\n16\t milk\r\n16\t yummy\r\n13\t coffee\r\n11\t crepe\r\n10\t parliamento\r\n<\/pre>\n<p>I think this is quite educational. I mention &#8220;pizza&#8221; more times than I mention &#8220;lunch&#8221;. This doesn&#8217;t mean I ate pizza more times than I ate lunch, but I guess pizza is more important for me to record in my diary than one meal of the day. <\/p>\n<p>Burger probably ranks so high because of the 8 Bacon Cheeseburgers in 8 Days project I undertook last September. Since then I&#8217;ve eat more burgers than I normally would do in a year, mainly to see if I can find a tastier burger. And, of course, with such a project in mind I&#8217;ll write about it in my diary more often.<\/p>\n<p>Mustafa&#8217;s H\u00c3\u00a4nchen Gemuse Kebab? The best kebab place in Berlin which happens to be right on my street? 17 visits in the last year, I&#8217;m guessing. And 10 trips to Pizza Parliamento, my favorite pizza restaurant near my apartment. <\/p>\n<p>&#8220;Tasty&#8221; and &#8220;yummy&#8221; pop up more than I would have thought. I guess &#8220;yummy&#8221; is a word I&#8217;d use more in a diary than normal conversation.<\/p>\n<p>Next set of results:<\/p>\n<p><b>Places!<\/b><\/p>\n<pre>\r\n139\t berlin\r\n127\t park\r\n122\t bar\r\n71\t apartment\r\n71\t hotel\r\n65\t ejc\r\n61\t hot-tub\r\n52\t airport\r\n47\t gym\r\n41\t prinsendam\r\n40\t london\r\n39\t bookshop\r\n37\t cabin\r\n32\t hill\r\n27\t cafe\r\n26\t boat\r\n25\t theatre\r\n24\t bank\r\n24\t ubahn\r\n21\t port\r\n<\/pre>\n<p>Berlin wins, of course. But there&#8217;s a lot to learn about me here. &#8220;Park&#8221; means Victoria Park in Berlin, where I go to juggle every day when the weather is good.<\/p>\n<p>&#8220;Bar&#8221; is self explanatory, right? <\/p>\n<p>&#8220;Hot-tub&#8221;? When the weather is good in Berlin I go hang out in the park. When the weather is good while I&#8217;m on a cruise ship, and even when it isn&#8217;t, I usually spend an hour per day in the hot-tup and pool. On a sea day I hang out while the sun sets, otherwise I hang out while we sail out of the port.<\/p>\n<p>&#8220;EJC&#8221; isn&#8217;t just a place, but an event, which I mention throughout the year as I was part of the organizing team.<\/p>\n<p>&#8220;Gym&#8221; in NO WAY means a place where I get fit. Instead it means the gymnasiums at juggling conventions. <\/p>\n<p>The &#8220;Prinsendam&#8221; is a ship that I perform on six or seven times a year. And other words like &#8220;airport&#8221;, &#8220;cabin, &#8220;boat&#8221;, &#8220;hotel&#8221;, &#8220;ubahn&#8221;, and &#8220;port&#8221; just show how much travel is a big part of my life.<\/p>\n<p>Next results?<\/p>\n<p><b>Activities\/Verbs.<\/b> A top 20:<\/p>\n<pre>\r\n772\t went\r\n568\t show\r\n565\t think\r\n441\t work\r\n427\t juggling\r\n385\t going\r\n265\t chatted\r\n265\t said\r\n259\t sleep\r\n244\t make\r\n239\t played\r\n193\t guess\r\n160\t chatting\r\n157\t met\r\n152\t tried\r\n146\t ate\r\n146\t feel\r\n146\t remember\r\n141\t play\r\n139\t found\r\n<\/pre>\n<p>This seems pretty standard, I guess. And saying &#8220;I guess&#8221; might explain why I do so much guessing.<\/p>\n<p>Looking further down the list, I notice &#8220;116 sex&#8221;. I know for a fact I didn&#8217;t have sex 116 times! <\/p>\n<p>And then &#8220;97 shower&#8221;. I know for a fact I had a shower more than 97 times!<\/p>\n<p>&#8220;Combat&#8221; is mentioned 90 times. And &#8220;juggle&#8221; (as opposed to &#8220;juggling&#8221;) another 83 times, and &#8220;juggle&#8221; 80 times.<\/p>\n<p>Way down the list is &#8220;uploaded&#8221; at 47 mentions, but that&#8217;s high above &#8220;downloaded&#8221; at 23 mentions. I guess this shows that uploading new content like podcasts and photography is more important. Or something.<\/p>\n<p>Strangely &#8220;photography&#8221; only gets 68 mentions. I thought this would be higher, but it&#8217;s just down to word choice, I guess. That brings me on to the next set of results&#8230; <\/p>\n<p><b>Things, objects, nouns, etc.<\/b> The top 20:<\/p>\n<pre>\r\n375\t bed\r\n298\t photos\r\n267\t food\r\n222\t room\r\n214\t video\r\n205\t ship\r\n197\t music\r\n183\t internet\r\n183\t song\r\n168\t club\r\n152\t book\r\n143\t podcast\r\n139\t stage\r\n130\t head\r\n129\t shows\r\n110\t camera\r\n109\t game\r\n109\t songs\r\n100\t guitar\r\n99\t facebook\r\n<\/pre>\n<p>See? Photography is very important to me. So is music and performing, and reading, and my online life. <\/p>\n<p>I&#8217;m not sure what else I need to mention about this list of words. <\/p>\n<p>And on to the final set&#8230;<\/p>\n<p><b>People!<\/b> <\/p>\n<p>This time, to be a bit more inclusive, I&#8217;ll list the top 30. <\/p>\n<pre>\r\n279\t Julianne\r\n242\t kim-nga\r\n104\t luke\r\n69\t kissha\r\n68\t pola\r\n62\t daniel\r\n54\t declan\r\n49\t olga\r\n48\t eva\r\n44\t karo\r\n43\t alex\r\n39\t doreen\r\n37\t flo\r\n35\t jeff\r\n33\t dj\r\n33\t nathan\r\n33\t scott\r\n31\t rym\r\n29\t john\r\n28\t billy\r\n28\t kyle\r\n26\t christine\r\n26\t jesse\r\n26\t tim\r\n25\t jochen\r\n24\t david\r\n23\t nat\r\n22\t corinna\r\n22\t jessica\r\n22\t jj\r\n22\t lee\r\n<\/pre>\n<p>And let&#8217;s just start at the top. &#8220;Juliane&#8221; is, of course, my current girlfriend. I met her for the first time at the start of June, so she wins by quite a number of mentions in under three months worth of diary. <\/p>\n<p>Second place is &#8220;Kim-Nga&#8221; who was my girlfriend last year. We were together from October to early January, so about three months again. Though &#8220;together&#8221; is funny word for a long distance relationship.<\/p>\n<p>Third place is &#8220;Luke&#8221; which is me. This is because I addressed many diary entries to my future self, saying &#8220;Hey Future Luke, reading back over this diary, here&#8217;s what you did today. This is reflected in the song I wrote last September called <a href=\"https:\/\/lukeburrage.com\/blog\/archives\/1074\" target=\"new\">Future Luke<\/a>.<\/p>\n<p>Kissha is friend in Berlin who I kinda dated in the spring. Pola is my ex-ex-girlfriend who still pops up in my life quite often. But in a good way, as we are still friends.<\/p>\n<p>And then as I look down I see friends I hang out with in Berlin, people I&#8217;ve spent time with on cruise ships, people who have stayed at my place, people I&#8217;ve been to juggling conventions with, and people I met last year in New York. <\/p>\n<p>Some people don&#8217;t feature much in my diary though, even though they feature quite large in my life. I&#8217;m not sure why I didn&#8217;t mention them more. <\/p>\n<p>This includes:<\/p>\n<p>1.  Girls I met in Berlin, with whom I hoped to begin some kind of relationship, but for some reason it didn&#8217;t work out. So I&#8217;d think about them quite a bit, and mention them every now and then in my diary, but wouldn&#8217;t make it in every time I thought about them, only when I met them, or planned to meet them.<\/p>\n<p>2. People I spent just a few days with on a single trip, and might have changed the direction of my life in a big way, but following that I didn&#8217;t meet them again.<\/p>\n<p>3. People whom I chat with on an almost daily basis online, who are just part of my every day life but I don&#8217;t &#8220;do&#8221; anything with them worth writing about in my diary. <\/p>\n<p>And then some people on this list are there for negative reasons. &#8220;Lee&#8221; was a very annoying guest entertainer I had the displeasure of spending three weeks with on a cruise in the spring. In fact, I didn&#8217;t spend much time with him, I actively avoided him, but the other entertainers kept getting annoyed with him, and all I heard from them were complaints. <\/p>\n<p>Actually, I think Lee is the only negative placement in the above list. <\/p>\n<p>Finally, in the 365 days I was aged 30, I had sex with 5 girls. I&#8217;ll not say who they were, but I&#8217;m glad they all made the top 30 above. <\/p>\n<p><b>That&#8217;s it!<\/b> This is such a weirdly abstract way to analyze ones life, I&#8217;m not sure if it is helpful or  unhelpful. I don&#8217;t think there&#8217;s anything else I need to share about my life for a while.<\/p>\n<p>Last note:<\/p>\n<p>The longest &#8220;word&#8221; in the diary came out as &#8220;long-distance-non-dating-friend-with-no-benefits&#8221;. This is a specially invented term for Robyn!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is going to be a bit of strange blog post, but I&#8217;ll see how it turns out. As I mentioned in my last blog post, I kept a diary for a year. All of it is in text files, &hellip; <a href=\"https:\/\/lukeburrage.com\/blog\/archives\/1390\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[30,51,14,50,37,10,33,12,11,9,52],"tags":[],"_links":{"self":[{"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/posts\/1390"}],"collection":[{"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/comments?post=1390"}],"version-history":[{"count":6,"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/posts\/1390\/revisions"}],"predecessor-version":[{"id":1398,"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/posts\/1390\/revisions\/1398"}],"wp:attachment":[{"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/media?parent=1390"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/categories?post=1390"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lukeburrage.com\/blog\/wp-json\/wp\/v2\/tags?post=1390"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}