

Pruning Vines at my Farm Startup 
[Agriculture] 
Posted on April 26, 2013 @ 07:48:00 AM by Paul Meagher
It took me 2 and a half days to finish planting my trees and shrubs and prune my vines. I was not planning on pruning my vines as I thought
I would leave well enough alone. However, when I inspected the vines closely I noticed a pattern of dead growth at the tops of the vines. The
vines die back a bit over the winter. I decided I would prune the tops back to a viable bud and also prune the vines back to 1 to 3 main shoots.
Year 2 for the vines will be mostly about getting better rooted and getting them trained to hang off my trellis wires properly. Usually want
2 main shoots with one shoot going left and one shoot going right on your guide wire.
Grapevine pruned back to 2 main shoots
Took about 3 and a half hours to prune all my vineyard plants. Ideally you would do this during winter months when the plants are dormant, however, that is not a very pleasant job at that time of year. Bob Osborne at Cornhill Nursury where I purchased my apple/pear/blueberry stock talked about the last two weeks of April as being an ideal time to do grafting work so I assumed that the same timeframe applies to vine pruning as well.
2 yr old vineyard is pruned.
I'm hoping to expand my vineyard this year. My neighbor at the farm, A.J. Taylor, provided plowing services last fall. He plowed 8 strip tills which I will rototill in early summer and plant my vines into.
Strip tills for grape vines with gutters draining water.
I finished planting my vine cuttings in my home greenhouse nursery over the last weekend. I planted 8 to 9 hundred cuttings in my 12ft x 15ft sunken greenhouse. This is my second year nursing vines in my greenhouse. My vines were well rooted when I eventually planted them out last year.
Densely packed vines at my home nursury.
It takes awhile for a farm startup to get to the point of producing enough farm product to generate significant income. Apple trees and pear trees could take up to 10 years before they start fruiting significantly. Vines provide a faster return as it can start producing it its third year. Then there are the annual crops that can give you more immediate returns. So far, I have plans to plant surplus amounts of potatoes and squash. I don't plan on getting rich of this crop, but if I am successful, I'd be more willing the next year to scale up my production to a level where it could be profitable. I also offer a couple of vacation rental units at the farm that cover some costs. The burn rate for the last couple of years has been quite high to repair buildings, get the rental units ready, and acquire machinery and tools. I have been able to use these expenses as much needed tax write offs (I only claim a hobby farm amount for now and don't try to spend much above that amount) but eventually a farm has to be profitable or the tax man will look unfavorably upon the enterprise.

Permalink 
Planting Apple Trees 
[Agriculture] 
Posted on April 23, 2013 @ 02:27:00 PM by Paul Meagher
I'll be juggling my farm startup business with my online dealflow interests over the next couple of days. What that means is that I am in the middle of planting approx. 80 trees and shrubs  apple trees, pear trees, and high bush blueberry bushes. I drove 225 km from Truro, Nova Scotia to Mabou, Cape Breton this morning to do the planting.
I have a row of 18 Honeycrisp apple trees planted and hope to get 40 apple trees planted today (varieties include Honeycrisp  2yr, Cortland 1 yr, Sunrise  1 yr, and Cox Pippen 1 yr. It is 5:00 and my planting deadline is 8:30 tonight. Rain tomorrow so I'll be planting in a mild rain tomorrow with warm temps (914 degrees Celcius). The rain will provide the required watering for the newly planted 1 and 2 year old apple trees, 1 yr old pear trees, and 20 blueberry bushes.
Getting the tractor geared up for tree planting.
Inspected the 300400 vines I planted last year. They appear to be doing ok and I'm looking forward to see how they grow this year. There are significant varietal diffferences  different grape varieties have difffering degrees of vigor in my soil. I'm curious to see if the most vigorous variety could start producing this year  1 year after planting. Usually takes 3 years, and probably will, but a grape vine can put on 6 to 9 feet of growth in a season if it is in good soil, sun, and ambient conditions. Supposed to pinch the grapes of this year to let the roots get more of the joy juice, but on some of my vigorous plants I might see what they will do without pinching.
2 yr old unpruned vines in early spring
There is some evidence of snow damage on my vines. Had a heavy layer of snow at the bottom of my field and the melting action broke vine guide poles (bamboo) and the vines themselves in some cases. Only the vines on the hill towards a forested corner of the field were subject to this damage perhaps because of the vortex winds that develop in this unique wind/snow environement. During the fall, winter and early spring there are heavy winds on this maritime ridgetop grasslands farm. During later sping, summer, and autumn the winds are generally pleasant but occasionally the winds will kick in again (sou'westers winds during summer). Based upon winterime experiences I thought I could run a wind turbine here and I could generate alot of power, however, I discovered that over later spring to autumn period the winds are quite pleasant.
View of farmstead from orchard
As I do my manual labor, I am reflecting upon the role of Bayesian Inference for angel investors and entrepreneurs. According to lean startup theory a startup is defined by the level of uncertaintly in its operations. So if operational uncertainty is the defining aspect of what a startup is, then how do we go about representing, understanding, and managing that uncertaintly? Does Bayesian Inference offer a formal foundation for lean startup theory?
Back to planting.

Permalink 
Bayesian Entrepreneurship 
[Bayesian Inference] 
Posted on April 18, 2013 @ 09:15:00 AM by Paul Meagher
A bit of housekeeping first. To keep track of my discussion of topics related to Bayesian Inference, I have created a blog category called "Bayesian Inference". You can click on the category link Bayesian Inference to see how my earlier blogs prepare the groundwork for my later blogs on Bayesian inference. If you are new to this topic, I recommend reading my oldest Bayesian inference blog first and then reading each one up to my most recent Bayesian inference blog. Later blogs build on earlier blogs.
To date I have focused on Bayesian Angel Investing and have offered up some ideas and code as to how Bayesian inference might be applied to angel investing. After introducing the idea of Bayesian Angel Investing I then offered a classification framework for Bayesian Angel Investing. This was followed by a blog on measuring classification accuracy. I then introduced some foundational concepts in Bayesian inference such as conditional probability, prior probability, Bayes Theorem, and the concept and calculation of likelihoods. My last blog discussed a Bayes Wizard application that computes conditional probabilities of startup success and failure according to Bayes Theorem. It was meant to tie some of these foundational concepts together into a simple and useful web application.
Bayes inference techniques are not limited to helping angel investors optimize their investment decision, they can also be used by entrepreneurs to optimize their startup decision making. For example, entrepreneurs must make decisions about how they should invest their startup capital in order to maximize their return on investment. Imagine that you are a new farmer and must make a decision about whether to invest in buying wheat seed for the upcoming growing season. To make an optimal decision here you might begin by estimating the joint probability of getting 28 cm or more of rain during the wheat growing season AND that your wheat yield will be 7800 kg/ha or more. You might estimate this value by tallying the number of instances of (rain >= 28 cm and wheat yield >= 7800 kg/ha) and dividing this by the total number of observations you have on rain amount and wheat yield. Lets assume the P(R>=28 cm & Y>=7800 kg/ha) = .18. From historical records you might also estimate that the probability of getting a rain amount >= 28 cm to be .21. Now using our definition of conditional probability P(HE)=P(H&E)/P(E), we calculate P(Y>=7800 kg  R >= 28 cm) as follows:
P(Y>=7800 kg  R >= 28 cm) = P(R>=28 cm & Y>=7800 kg/ha) / P(R>=28cm) = .18/.21 = .86
This tells us that the probability of getting a good yield from our wheat is fairly high if we get 28 cm or more of rain during the wheat growing season. The probability of getting 28 cm of rain or more is, however, only .21 so we might want to examine other rainfall amounts and yield amounts to see if there is a good yield value for a more probable rain fall amount. This is how a startup farmer might go about making an optimal decision regarding whether to purchase wheat seed for the upcoming growing season. It might be noted that there is a very high correlation between rainfall amounts and wheat yield (correlation coefficient of .95) so of all the variables that a farmer might take into account in making a seed purchase decision, an investigation into rainfall amounts and wheat yields is a particularly important relationship to examine when projecting a probable return on investment. Don't waste your time calculating probabilities based upon factors that don't really matter that much.
There are two ways to make decisions  analytically or nonanalytically. Making a decision analytically requires the quantification of the main elements in your decision problem so that you can compute answers. The main reason entrepreneurs might want to bother with analytic decision making is if they can make better decisions by adopting an analytical approach versus a nonanalytical approach (perhaps "intuitive" would be a more favorable word to use). In some ways this dichotomy is false because most "analytic" decisions involve a combination of analytic and intuitive problem solving, however, it is worth emphasizing the distinction because the role of analysis in entrepreneurial decision making is not an aspect of entrepreneurship that is discussed much. It is worth examining whether Bayesian inference techniques might be useful for entrepreneurs to learn because they lead to more success. It is difficult to say whether this is true or not because the idea of Bayesian entrepreneurship has not been studied or promoted to date. Maybe this blog will help change this state of affairs by offering some instruction on how Bayesian inference techniques might be applied in entrepreneurial decision making.

Top ten wheat producers — 2011 (million metric ton) 
People's Republic of China 
117 
India 
86 
Russia 
56 
United States 
54 
France 
38 
Canada 
25 
Pakistan 
25 
Australia 
24 
Germany 
22 
Kazakhstan 
22 
World total 
469 
Source: UN Food & Agriculture Organisation(FAO) 


Permalink 
A Bayes Wizard for Predicting Startup Success 
[Bayesian Inference] 
Posted on April 17, 2013 @ 12:30:00 PM by Paul Meagher
In my last blog I showed how to compute the likelihood term P(EH) in Bayes formula which is shown below:
P(HE) = P(EH) * P(H) / P(E)
In today's blog we will be using the likelihood values we previously computed in order to predict startup success based upon the evidence of two diagnostic tests P(HE). Here is the data table we created in the last blog with likelihoods appearing in parenthesis.

Tests 
Outcome 
# Startups 
++ 
+ 
+ 
 
S 
1200 
650 (.54) 
250 (.21) 
250 (.21 
50 (.04) 
U 
8800 
100 (.01) 
450 (.05) 
450 (.05) 
7800 (.89) 
Total 
10,000


This data table provides us with all the information we need in order to use Bayes Theorem to predict the probability of startup success given evidence from two diagnostic tests. To compute the posterior
probabilities for each hypothesis given different evidence patterns, we will use a simple bayes_wizard.php script. Let me show you how it works.
When we point our browser at the bayes_wizard.php script (in a phpenabled web folder), the first screen asks us to input the number of hypothesis and test labels:
The next screen asks us to input the labels for the hypothesis and tests. We use S to mean successful startup and U to mean unsuccessful startup. We use ++ to indicate a positive outcome on two diagnostic tests,  to indicate a negative outcome on two diagnostic tests, and so on.
Next we are asked to enter the prior probability of the different hypothesis (i.e., P(H=S) and P(H=U)). These are just the fraction of the 10,000 startups classified as successful or unsuccessful.
The next screen asks up to input the likelihood for each combination of test and hypothesis. We enter the likelihoods we computed in our last blog in this screen (see values in parenthesis in table above):
The final screen displays the posterior probabilities for each hypothesis given each evidence pattern:
The way to interpret this table is to examine each row separately. In the first row where we have two diagnostic tests with positive outcomes, we see that the posterior probability that the startup is successful is significantly higher (.88) that the probability that the startup is unsuccessful (.12). So, a startup exhibiting this pattern of diagnostic evidence is quite likely to be successful. Our posterior probability calculation allows us to move from an inital estimate of 12 percent probability of startup success to an 88 percent probability of startup success.
The diagnostic tests that might be used could be anything that might be predictive of startup success. We could, for example, assess a startup's business plan with respect to a checklist of desirable attributes and score it as pass + or fail . The Bayes Wizard allows you to specify as
many tests and hypothesis as you want. It is up to you to come up with the hypothesis you want to examine and the number and kind of tests you want to use. You should look for empirical information about the covariation between your tests and outcomes so that you can compute the required likelihood terms.
If you have been following my last few blogs, you should now have a good sense of how you can begin to use Bayes inference to arrive at better Angel Investment decisions. If you want to see how the wizard works under the hood and how the Bayes theorem calculation is implemented, you can download the code from my GitHub account.
https://github.com/mrdealflow/BAYES

Permalink 
Computing the Likelihood of Startup Success 
[Bayesian Inference] 
Posted on April 16, 2013 @ 07:51:00 AM by Paul Meagher
There are many ways to compute a conditional probability such as P(HE).
The simplest ways to compute P(HE) is:
P(HE) = P(H & E) / P(H)
In my last blog introducing Bayes Theorem, I showed how to rearrange terms so that you could compute P(HE) using a version of the conditional probability formula called Bayes Theorem:
P(HE) = P(EH) * P(H) / P(E)
I also showed that this equation could be further simplified to:
P(HE) ~ P(EH) * P(H)
Where the symbol ~ means "is proportional to". The equation says that the probability of an hypothesis given evidence P(HE) is equal to
the likelihood of the evidence P(EH) given the hypothesis multiplied by a prior assessment of the probability of our hypothesis P(H).
The likelihood term plays a critical role in updating our prior beliefs. So how is it computed and what does it mean? That is what
will be discussed today.
Below I have fabricated a data table consisting of 10,000 startups classified as successful S (1200 instances) or unsuccessful U (8800 instances). In a previous blog, I reported a finding that claimed the success rate of first time startups is 12% which equates to 1200 instances out of 10,000. The data table also includes the outcome of two diagnostic tests. A positive outcome on both tests is denoted ++, while a negative outcome is denoted . Each cell displays a joint frequency value and a corresponding likelihood value for the relevant combination of diagnostic tests and startup outcomes.

Tests 
Outcome 
# Startups 
++ 
+ 
+ 
 
S 
1200 
650 (.54) 
250 (.21) 
250 (.21 
50 (.04) 
U 
8800 
100 (.01) 
450 (.05) 
450 (.05) 
7800 (.89) 
Total 
10,000


Computing a likelihood from this data table is actually a simple calculation involving the formula:
P(EH) = P(H & E) / P(H)
To calculate the likelihood of two positive tests given that a startup is successful P(E=++H=S), we divide the joint frequency of the evidence E=++ when a startup is successful H=S (which is 650) by the frequency of startup success H=S (which is 1200). So 650/1200 is equal to .54 which is the value in parenthesis beside 650 in the table above. To calculate the likelihood of two positive tests given that a startup is unsuccessful P(E=++H=U), we divide the joint frequency of the evidence E=++ when a startup is unsuccessful H=U (which is 100) by the frequency that a first time startup is unsuccessful H=U (which is 8800). So 100/8800 is equal to .01 which is the value in parenthesis beside 100 in the table above.
The likelihood calculation tells us which hypothesis makes the evidence most likely. In this case, the hypothesis that the startup is successful makes the positive outcome of our two diagnostic tests (E=++) more likely (.54) than the hypothesis that the startup is unsuccessful (.01). We can examine the likelihood values in each column to determine
which hypothesis makes the diagnostic evidence more likely. You can see why the likelihood values are important in updating our prior beliefs about the probability of startup success. We can also appreciate why some would argue that likelihood values are sufficient for making decisions  just compare the relative likelihood of the different hypothesis given the evidence.

Permalink 
Introduction to Bayes Theorem 
[Bayesian Inference] 
Posted on April 12, 2013 @ 11:18:00 AM by Paul Meagher
In this blog, I'll be doing a bit of algebra to show you that our conditional probability formula P(HE) = P(H & E) / P(E) is equivalent to
P(HE) = P(EH) * P(H) / P(E). This latter form of the equation is the version that people most often refer to as Bayes theorem. They
are mathematically equivalent, however, in different circumstances it is easier to work with one versus the other. A Bayesian
Angel Investor will need to master this Bayes theorem version of the conditional probability equation. This version of the equation includes a term P(EH) called the likelihood term which is also critical for a Bayesian Angel Investor to understand and master. We will briefly discuss this term, leaving a more detailed discussion until next week when I will dedicate a blog to the likelihood concept.
The derivation of Bayes theorem follows naturally from the definition of conditional probability:
P(HE) = P(H & E) / P(E)
Using some simple algebra (moving terms from one side to the other), this equation can be rewritten as:
P(H & E) = P(E  H) * P(E)
The same righthand value can also be computed using E as the conditioning variable in the righthand part of the equation:
P(H & E) = P(H  E) * P(E)
Given this equivalence, you can write:
P(HE) * P(E) = P(EH) * P(H)
We can now substitute P(EH) * P(H) for P(H & E) and arrive at Bayes theorem:
P(HE) = P(EH) * P(H) / P(E)
Notice that this formula for computing a conditional probability is similar to the original formula with the exception that the joint probability P(H & E) that used to appear in the numerator has been replaced with an equivalent expression P(EH) * P(H).
We can simplify this equation further by pointing out that P(E), the probability of the evidence, is just a mathematical convenience that ensures that when we compute all our conditional probabilities P(HE), they collectively sum to 1. Conceptually, we can eliminate it from our equation by making the weaker claim that P(HE) is proporational to P(EH) * P(H):
P(HE) ~ P(EH) * P(H)
What this simplified equation is saying is that the probability of an hypothesis (e.g., startup success) given the evidence (e.g., tests diagnostic of startup success) is proportional to the likelihood of the evidence P(EH) times the prior probability of the hypothesis P(H). When making decisions, we don't necessarily need to know the probability of success exactly, just that the success probability is quite a bit bigger than the failure probability. This is why this simpler version of Bayes theorem is still useful even though it only expresses a proportional relationship and not a full identity.
In order to update our prior probability of firsttime startup success from .12 (or 12%) given the evidence of some diagnostic tests, we need to multiply our prior assessment of first time startup success P(H) by a factor called the likelihood P(EH). The likelihood term is obviously doing alot of the heavy lifting in terms of updating our prior beliefs.
In my next blog, I will discuss how likelihoods can be computed from a data table using the conditional probability equation P(EH) = P(E & H)/P(H) and other techniques. Some statisticians argue that likelihoods are good enough for decision making, that you don't have to incorporate prior probabilities P(H) into calculations to figure out the most probable outcome. These statisticians are afraid of introducing a subjective element (e.g., your prior assessment P(H) of the relative probability of different outcomes) into decision making. Bayesians argue that this subjective element makes the probability calculations more intelligent and contextually sensitive. An angel investor with lots of business experience should have at their disposal a mathematical tool that allows them to use their experience in making startup investment decisions. Bayesian inference techniques offer the promise of being that tool.

Permalink 
Prior Probability of Startup Success 
[Bayesian Inference] 
Posted on April 10, 2013 @ 09:26:00 AM by Paul Meagher
One of the pieces of data you should have in your mind as a Bayesian Angel Investor is the prior probability that a startup will be successful. According to Funders and Founders the success rate for first time startups is 12%, going up to 20% if the founder failed in their first effort, and up to 30% if they are a veteran (3 or more kicks at the can).
One way to look at this data is that the success percentages for a startup go up from 12% if you conditionalize your estimate on knowledge about how many times the startup has attempted to start a company. So this could be viewed as one evidence factor to consider when evaluating whether a company will be successful or not (e.g., number of startup attempts).
Another aspect of this data to note is that while 12% may seem like a small percentage, it is not so small (say 1%) that new knowledge is going to keep the conditional probabilities so low that you cannot make a confident decision. Early screening for breast cancer (e.g., at age 40) is difficult, in part, because the base rate of breast cancer at age 40 is so low (1%) that even if you do have a fairly good test (80% true positive rate), and that test is positive, it will only increase the probability of a cancer diagnosis to approx. 8%. With a 12% success rate for first time startups, we can potentially increase our estimate of a companies success rate by quite a bit by taking into account other information about the company. Two good diagnostic tests applied in sequence could get us up to a 80% probability estimate of startup success and increase the likelihood that you will make a good angel investment decision.
Research shows that even doctors are not very good at taking base rates (i.e., priors) into account and put too much emphasis upon the test accuracy to arrive at conditional probability estimates for a diagnosis. Their estimates can be improved considerably if instead of being given information in a probability format (0.12 probability of firsttime startup success), the information is presented in a frequency format (120 out of 1000 firsttime startups are successful). Sticking with numbers as frequency counts allows us to mentally compute more accurate conditional probabilities.

Permalink 
Conditional Probability of Startup Success 
[Bayesian Inference] 
Posted on April 8, 2013 @ 07:39:00 AM by Paul Meagher
In this blog post, I'll be going over the concept of Conditional Probability (i.e., P(HE). I'll be reusing some of my earlier writings on
bayesian inference using a medical example and substituting in an angel investing example. The concept of conditional probability is central to Bayesian inference. A bayesian angel investor is always computing the probability of some hypothesis given some pattern of evidence P(HE). There are many mathematical techniques you can use to compute a conditional probability P(HE), but the simplest way involves set enumeration and it is what clergyman Thomas Bayes had in mind when he proposed his new method of inference. So hopefully you will learn one important method for computing a conditional probability from reading this blog post.
Imagine that H refers to "Company is Successful" and E refers to "Quality Business Plan". P(H  E) would then read as the "probability that a company is successful (H) given that they have a quality business plan (E)." If H tends to occur when E occurs, then knowing that E has occurred allows you to assign a higher probability to H's occurrence than in a situation in which you did not know that E occurred.
More generally, if H and E systematically covary in some way, then P(H  E) will not be equal to P(H). Conversely, if H and E are independent events, then P(H  E) would be expected to equal P(H).
The need to compute a conditional probability thus arises any time you think the occurence of some event has a bearing on the probability of another event's occurring.
The most basic and intuitive method for computing P(H  E) is the set enumeration method. Using this method, P(H  E) can be computed by counting the number of times H and E occur together {H & E} and dividing by the number of times E occurs {E}:
P(H  E) = {H & E} / {E}
If you gave your ok to 12 business plans to date, and observed that 10 of those companies were successful, then P(H  E) would be estimated at 10/12 or 0.833. In other words, the probability of a company being successful given that they have a quality business plan can be estimated at 83 percent by using a method that involves enumerating the relative frequencies of H and E events from the data gathered to date.
Computing a conditional probability becomes a form of inference when we take into account that the prior probability P(H) that a startup would be successful was probably lower than 83 percent. So conditionalizing our hypothesis (company will succeed) on other information (business plan quality) helped to increase our estimate of the probability that a startup would be successful. We can make decisions to proceed further based upon this improved knowledge.
You can compute a conditional probability using the set enumeration method with the PHP code below.
<?php
/** * @script conditional_probability.php * @author: paul@datavore.com * * @purpose: Illustrates how to compute a conditional probability * using set enumeration */
/** * Returns conditional probability of $A given $B and $Data. * $Data is an indexed array. Each element of the $Data array * consists of an A measurement and B measurment on a sample * item. */ function getConditionalProbabilty($A, $B, $Data) { $NumAB = 0; $NumB = 0; $NumData = count($Data); for ($i=0; $i < $NumData; $i++) { if (in_array($B, $Data[$i])) { $NumB++; if (in_array($A, $Data[$i])) { $NumAB++; } } } return $NumAB / $NumB; }
/** * The elements of the $Data array use this coding convention: * * +success  company is successful * success  company is not successful * +bizplan  bizplan passed quality test * buzplan  bizplan failed quality test */
$Data[0] = array("+success", "+bizplan"); $Data[1] = array("+success", "+bizplan"); $Data[2] = array("+success", "+bizplan"); $Data[3] = array("+success", "+bizplan"); $Data[4] = array("+success", "+bizplan"); $Data[5] = array("+success", "+bizplan"); $Data[6] = array("+success", "+bizplan"); $Data[7] = array("+success", "+bizplan"); $Data[8] = array("+success", "+bizplan"); $Data[9] = array("+success", "+bizplan"); $Data[10] = array("success", "+bizplan"); $Data[11] = array("success", "+bizplan");
// specify query variable $A and conditioning variable $B $A = "+success"; $B = "+bizplan";
// compute the conditional probability of having cancer given 1) // a positive test and 2) a sample of covariation data $probability = getConditionalProbabilty($A, $B, $Data);
echo "P($A$B) = $probability";
// P(+success+bizplan) = 0.83333333333333
?>

Permalink 
Statistic Brain 
[Statistics] 
Posted on April 5, 2013 @ 05:57:00 AM by Paul Meagher
There are some University of Tennessee Research statistics on startup failure rates by industry at Statistic Brain.
Statistic Brain looks to be quite a useful and entertaining resource for those seeking a thrill with numbers.
From the site:
Statistic Brain is a group of passionate number people. We love numbers, their purity, and what they represent. Numbers can bring humans together, they tell us how we are alike and how we are beautifully unique. Numbers are a way to reflect on how far we’ve come and give us hope for the future.
Our goal is to bring you accurate and timely statistics. We will never become number analysts because we believe numbers should only be interpreted by the reader. We want to educate, assist, and sometimes entertain with numbers on every subject.
We hope that today you learn something new, find inspiration for tomorrow, and use your knowledge for something good.
Seth Harden
CEO / Founder

Permalink 
Accurate Classification of Startup Success 
[Bayesian Inference] 
Posted on April 5, 2013 @ 01:49:00 AM by Paul Meagher
In my last blog introducing a classification framework for Bayesian Angel Investing, I discussed a phpbased software class called ClassifierDiagnostics.php . I showed how you enter bivariate data points into it and the type of output it displays. I didn't go into much detail on what the output is telling us. Today I will go into some more detail on what the output is telling us and start to give some indication as to why it is important if you want to be a successful Bayesian Angel Investor.
One way to formulate the problem of Bayesian Angel Investing is as a classification problem where an Investor is trying to asign a probability to whether a startup belongs to the class of "Successful" (S) companies or "Unsuccessful" (U) companies. One way to do this would be to just rely upon the prior odds of a startup being successful or not. You would not conditionalize the probability assignments (e.g., P(S) = θ_{1}, P(U) = θ_{2}) on information about the start up (e.g., P(SI) = θ_{3}), just the fact that
they are a startup and the historical probabilities that a startup will be unsuccessful or successful. This is more difficult than it sounds because the success of a startup is already conditionalized insofar as we have to delimit the scope of the concept "startup" in some way in order to measure the probabilities of success or not. So let us say
we will look at startups confined to some region near the Investor's place of residence  the state or province level statistics on startup success.
Can you use the startup success statistics, your "priors", to make successful investment decisions? My guess is that the rate of success for startups in your region is below
50% so if the probability of any given startup being successful is below 50% it is unlikely you will ever invest. You would have to invest randomly according to a "priors only"
strategy (i.e., P(S) = θ_{1} AND P(U) = θ_{2}) and that would produce losses.
To get more levarage on making good angel investments, you will need to incorporate information about the startup in your classification decision regarding the likely
success or not of the startup. You will want to identify types of information that have good diagnostic value in classifying startups into bins labelled Successful (S) and
Unsuccessful (U). In the example I provided yesterday I suggested that you could use your evaluation of their business plan as a good indicator of whether the startup
might succeed or not. If the business plan addresses enough of your checklist of concerns, then you will assign the business plan a "Pass" value (1), otherwise you assign the business plan a "Fail" value (0). The question then becomes whether our pass/fail assignments can be used to successfully distinguish between successful and unsuccessful startups. In other words, how diagnostic is a good business plan of being a successful startup?
In my last blog, I entered observations of 4 startups into my classifier diagnostics program. Each observation consisted of two values, a value specifying whether the business plan passed (1) or
failed (0), and a value specifying whether the startup eventually succeeded (1) or failed (0) in their enterprise. When I entered the data into my classifier diagnostics
program it generated the output below. I have removed some of the statistics being reported because I want to focus on the foundational concepts in diagnostic problem solving.

Successful Company 
Yes 
No 
Business Plan 
Pass 
2 (TP) 
0 (FP) 
Fail 
1 (FN) 
1 (TN) 


Successful Company 
Yes 
No 
Business Plan 
Pass 
0.67 (TP) 
0.00 (FP) 
Fail 
0.33 (FN) 
1.00 (TN) 

Test Sensitivity (TP) 
0.67 
False Alarm Rate (FP) 
0.00 
Miss Rate (FN) 
0.33 
Test Specificity (TN) 
1.00 
One critical observation to make about this data is that business plan quality is not a perfect test for classifying startups as successful or unsuccessful. The most grievous error is the case where a startup had a failing business plan but ended up being successful (example of a "miss" or false negative). The test "missed" the correct classification. Because we have
such a low sample size, 4 startups, this one error throws our percentages around quite a bit.
What we are looking for in a good test of statup success is one that has high Test Sensivity and high Test Specificity. Test Sensitivity measures the proportion of actual positives which are correctly identified as such. Test specificity measures the proportion of actual negatives which are correctly identified as such. In real life, test sensitivity and specificity
are seldom 1, so we have to figure out how we will cope with false alarms (negative instances identified as positive instances) and misses (positives instances identified as negative instances). Rise averse angel investors will likely be more worried about false alarms than misses because in the case of a false alarm you could invest in an unsuccessful
company and lose money whereas in the case of a miss you will not have invested in a successful company but will have at least retained your money.
One way to proceed towards becoming a Bayesian Angel Investor is to do some diagnostic work and figure out what types of tests are the best to use in order to classify startups into those who will succeed or not. When evaluating tests to use, you should examine the diagnostic accuracy of your tests using some of the metrics provided above (Test Sensitivity, False Alarm Rate, Miss Rate, Test Specificity). Bayesian Angel Investing likely taps into the same problem solving skills as a doctor who must diagnose whether a patient has cancer or not. They will order up a series of tests (often binary scored) and make a diagnosis, or, if matters are still unclear, order up more tests (e.g., scans, probes, incisions, etc...) so that they can achieve more confidence in their decision making.

Permalink 
A Classification Framework for Bayesian Angel Investing 
[Bayesian Inference] 
Posted on April 4, 2013 @ 10:30:00 AM by Paul Meagher
This blog is a followup to yesterday's blog introducing the idea of Bayesian Angel Investing.
In 2004 I wrote 3 articles for IBM developerWorks on Bayesian inference and developed phpbased code to explore the topic with. I'd like to follow up on some of that work by exploring how Bayesian inference might be applied to Angel Investing.
It is hard to pick a starting point for this investigation. I thought the best way to begin would be to give a quick demo of how to use a ClassifierDiagnostics.php class I developed to analyze the relationship between two binaryvalued variables (a "test" variable and a "classification" variable). Doing so will introduce you to many concepts, calculations, and stats you should be familiar with if you want to apply Bayesian inference to Angel Investing.
The two variables we will be analyzing in the demo code below are "Business Plan Quality" test variable and a "Successful Company" classification variable. The data we will be inputting to our software for analysis will consist of a binary rating of Business Plan Quality (0=Fail, 1=Pass) and a binary rating for the Successful Company variable (0=Not Successful, 1=Successful). Each of the four $data records below corresponds to an observation conducted on one startup company. In this case, the observation of Business Plan Quality for a startup company and the eventual success or failure of that startup company. One question to investigate is whether the Business Plan Quality measurement should be used as a "test" for diagnosing whether a startup company will be successful or not.
Without further ado, here is the source code for the business_plan_and_success.php demo script which invokes input, analysis, and output functions supplied by the ClassifierDiagnostics.php class.
<?php
/** * business_plan_and_success.php * * Compute joint frequency and joint probability of two * variables: business plan quality (0=Fail, 1=Pass) and * and company success (0=No, 1=Yes). Displays joint * frequency table, joint probability table, and various * diagnostic statistics about the relationship between * the variables. */
require_once "ClassifierDiagnostics.php";
$data[0] = array("1", "1"); // Startup 1: BizPlan=Pass, Success=Yes $data[1] = array("0", "0"); // Startup 2: BizPlan=Fail, Success=No $data[2] = array("1", "1"); // Startup 3: BizPlan=Pass, Success=Yes $data[3] = array("0", "1"); // Startup 4: BizPlan=Fail, Success=Yes
$classifier = new ClassifierDiagnostics($data);
$classifier>setRowName("Business Plan"); $classifier>setRowTrue("Pass"); $classifier>setRowFalse("Fail");
$classifier>setColumnName("Successful Company"); $classifier>setColumnTrue("Yes"); $classifier>setColumnFalse("No");
$classifier>showCrossTabs(); $classifier>showStats();
?>
Below is the output generated by the running the demo script. The first set of tables below are the joint frequency and joint probability tables. Underneath these tables is displayed various diagnostic stats that can be used to assess the quality of your "test" variable (i.e., Business Plan Quality) in classifying a startup as being sucessful or not.

Successful Company 
Yes 
No 
Business Plan 
Pass 
2 (TP) 
0 (FP) 
Fail 
1 (FN) 
1 (TN) 


Successful Company 
Yes 
No 
Business Plan 
Pass 
0.67 (TP) 
0.00 (FP) 
Fail 
0.33 (FN) 
1.00 (TN) 

Test Sensitivity (TP) 
0.67 
False Alarm Rate (FP) 
0.00 
Miss Rate (FN) 
0.33 
Test Specificity (TN) 
1.00 
Base Rate 
0.75 
P(+Test) 
0.50 
P(Test) 
0.50 
P(+Class  +Test) 
1.00 
P(Class  +Test) 
0.00 
P(+Class  Test) 
0.50 
P(Class  Test) 
0.50 
Likelihood Ratio(+Test) 
0.00 
Likelihood Ratio(Test) 
0.33 
Accuracy 
0.75 
Gain 
1.33 
I'll return to discussing some of the stats being reported here in a later blog. For now, I'd like to complete the technical part of the demo by showing you the source code for the ClassifierDiagnostics.php object. If you put the ClassifierDiagnostics.php object in the same phpenabled folder the as business_plan_and_success.php demo script, then point your browser at the demo script, you will see the output above.
<?php /** * @package ClassifierDiagnostics * @author Paul Meagher <paul@datavore.com> * @license PHP License v3.0 * @version 0.2 * * The primary references I used when developing this class were: * * @see http://hippocrates.ouhsc.edu/cdmtutor/2x2/2x2tut2.html * @see http://www.musc.edu/dc/icrebm/sensitivity.html * * Sheskin, David. (2004) Handbook of Parametic and NonParametric * Statistical Procedures (pp. 245333). */ class ClassifierDiagnostics { var $data = array(); var $joint_freq = array(); var $joint_prob = array(); var $col_marginals = array(); var $row_marginals = array(); // default labels for crosstab display var $row_name = "test"; var $col_name = "class"; var $row_true = "+"; var $row_false = ""; var $col_true = "+"; var $col_false = ""; /* * If a two column data matrix is supplied to the class, it will * proceed to compute various accuracy metrics from this data. * Otherwise, use the loadJointFrequency method to bypass having * to feed in raw data. */ function ClassifierDiagnostics($data="empty") { if ($data != "empty") { $this>data = $data; // zero the cell counts $this>joint_freq[0][0] = 0; // True Negative  TN $this>joint_freq[0][1] = 0; // False Negative  FN $this>joint_freq[1][0] = 0; // False Positive  FP $this>joint_freq[1][1] = 0; // True Positive  TP // zero the corresponding cell probabilities $this>joint_prob[0][0] = 0; $this>joint_prob[0][1] = 0; $this>joint_prob[1][0] = 0; $this>joint_prob[1][1] = 0; $this>getJointFrequency(); $this>getColumnMarginals(); $this>getRowMarginals(); $this>getJointProbability();
} } /* * Load joint frequency distribution directly instead of * building it from supplied training data. */ function loadJointFrequency($joint_freq) { $this>joint_freq = $joint_freq; $this>getColumnMarginals(); $this>getRowMarginals(); $this>getJointProbability(); } /** * First index in joint_freq[t][c] matrix refers * to test outcome while the second index refers * the classification outcome. */ function getJointFrequency() { $nrows = count($this>data); for ($i=0; $i < $nrows; $i++) { // tally true negatives (TN): test AND class (aka specificity of test) if ( ($this>data[$i][0] == 0) AND ($this>data[$i][1] == 0) ) { $this>joint_freq[0][0]++; } // tally false negatives (FN): test AND class if ( ($this>data[$i][0] == 0) AND ($this>data[$i][1] == 1) ) { $this>joint_freq[0][1]++; } // tally false positives (FP): +test AND class if ( ($this>data[$i][0] == 1) AND ($this>data[$i][1] == 0) ) { $this>joint_freq[1][0]++; } // tally true positives (TP): +test, +class (aka sensitivity of test) if ( ($this>data[$i][0] == 1) AND ($this>data[$i][1] == 1) ) { $this>joint_freq[1][1]++; } } } function getRowMarginals() { $this>row_marginals[0] = $this>joint_freq[0][0] + $this>joint_freq[0][1]; $this>row_marginals[1] = $this>joint_freq[1][0] + $this>joint_freq[1][1]; }
function getColumnMarginals() { $this>col_marginals[0] = $this>joint_freq[0][0] + $this>joint_freq[1][0]; $this>col_marginals[1] = $this>joint_freq[0][1] + $this>joint_freq[1][1]; } function getJointProbability() { $this>joint_prob[0][0] = $this>joint_freq[0][0] / $this>col_marginals[0]; $this>joint_prob[1][0] = $this>joint_freq[1][0] / $this>col_marginals[0]; $this>joint_prob[0][1] = $this>joint_freq[0][1] / $this>col_marginals[1]; $this>joint_prob[1][1] = $this>joint_freq[1][1] / $this>col_marginals[1]; } function getTruePositiveRate() { return $this>joint_prob[1][1]; }
function getTrueNegativeRate() { return $this>joint_prob[0][0]; } function getFalsePositiveRate() { return $this>joint_prob[1][0]; }
function getFalseNegativeRate() { return $this>joint_prob[0][1]; }
function getBaseRate() { return ($this>joint_freq[1][1] + $this>joint_freq[0][1]) / array_sum($this>col_marginals); }
function getPosterior($row_status, $col_status) { if ($row_status == 1) { if ($col_status == 1) { return $this>joint_freq[1][1] / ( $this>joint_freq[1][1] + $this>joint_freq[1][0] ); } else { return $this>joint_freq[1][0] / ( $this>joint_freq[1][1] + $this>joint_freq[1][0]); } } else { if ($col_status == 1) { return $this>joint_freq[0][1] / ( $this>joint_freq[0][1] + $this>joint_freq[0][0]); } else { return $this>joint_freq[0][0] / ( $this>joint_freq[0][1] + $this>joint_freq[0][0]); } } }
function getRowProbability($row_status) { if ($row_status == 1) { return ($this>joint_freq[1][1] + $this>joint_freq[1][0]) / array_sum($this>row_marginals); } else { return ($this>joint_freq[0][1] + $this>joint_freq[0][0]) / array_sum($this>row_marginals); } }
function getAccuracy() { return ($this>joint_freq[1][1] + $this>joint_freq[0][0]) / array_sum($this>row_marginals); }
function getLikelihoodRatio($row_status) { if ($row_status == 1) { $numerator = $this>joint_freq[1][1] / $this>col_marginals[1]; $denominator = $this>joint_freq[1][0] / $this>col_marginals[0]; return $numerator / $denominator; } else { $numerator = $this>joint_freq[0][1] / $this>col_marginals[1]; $denominator = $this>joint_freq[0][0] / $this>col_marginals[0]; return $numerator / $denominator; } }
function getGain() { return $this>getPosterior(1,1) / $this>getBaseRate(); }
function setRowName($row_name) { $this>row_name = $row_name; }
function setColumnName($col_name) { $this>col_name = $col_name; }
function setRowTrue($row_true) { $this>row_true = $row_true; }
function setRowFalse($row_false) { $this>row_false = $row_false; }
function setColumnTrue($col_true) { $this>col_true = $col_true; }
function setColumnFalse($col_false) { $this>col_false = $col_false; } function showCrossTabs() { ?> <table cellpadding='15' align='center'> <tr> <td> <?php $this>showTable($this>joint_freq); ?> </td> <td> <?php $this>showTable($this>joint_prob, "%01.2f"); ?> </td> </tr> </table> <?php }
function showTable($matrix, $format="%u") { ?> <table border='1' cellspacing='1' cellpadding='8'> <tr> <td rowspan='2' colspan='2'> </td> <td colspan='2' align='center'><b><?php echo $this>col_name ?></b></td> </tr> <tr> <td align='center' height='20' bgcolor='silver'><?php echo $this>col_true ?></td> <td align='center' height='20' bgcolor='silver'><?php echo $this>col_false ?></td> </tr> <tr> <td rowspan='2' align='center'><b><?php echo $this>row_name ?></b></td> <td align='center' width='20' bgcolor='silver'><?php echo $this>row_true ?></td> <td align='center'><?php printf($format, $matrix[1][1]); ?><br/>(TP)</td> <td align='center'><?php printf($format, $matrix[1][0]); ?><br/>(FP)</td> </tr> <tr> <td align='center' bgcolor='silver'><?php echo $this>row_false ?></td> <td align='center'><?php printf($format, $matrix[0][1]); ?><br/>(FN)</td> <td align='center'><?php printf($format, $matrix[0][0]); ?><br/>(TN)</td> </tr> </table> <?php } function showStats() { ?> <table align='center' cellpadding='5'> <tr bgcolor='silver'> <td>Test Sensitivity (TP)</td> <td><?php printf("%01.2f", $this>getTruePositiveRate()); ?></td> </tr> <tr bgcolor='silver'> <td>False Alarm Rate (FP)</td> <td><?php printf("%01.2f", $this>getFalsePositiveRate()); ?></td> </tr> <tr bgcolor='silver'> <td>Miss Rate (FN)</td> <td><?php printf("%01.2f", $this>getFalseNegativeRate()); ?></td> </tr> <tr bgcolor='silver'> <td>Test Specificity (TN)</td> <td><?php printf("%01.2f", $this>getTrueNegativeRate()); ?></td> </tr> <tr> <td>Base Rate</td> <td><?php printf("%01.2f", $this>getBaseRate()); ?></td> </tr> <tr> <td>P(+Test)</td> <td><?php printf("%01.2f", $this>getRowProbability(1)); ?></td> </tr> <tr> <td>P(Test)</td> <td><?php printf("%01.2f", $this>getRowProbability(0)); ?></td> </tr> <tr> <td>P(+Class  +Test)</td> <td><?php printf("%01.2f", $this>getPosterior(1, 1)); ?></td> </tr> <tr> <td>P(Class  +Test)</td> <td><?php printf("%01.2f", $this>getPosterior(1, 0)); ?></td> </tr> <tr> <td>P(+Class  Test)</td> <td><?php printf("%01.2f", $this>getPosterior(0, 1)); ?></td> </tr> <tr> <td>P(Class  Test)</td> <td><?php printf("%01.2f", $this>getPosterior(0, 0)); ?></td> </tr> <tr> <td>Likelihood Ratio(+Test)</td> <td><?php printf("%01.2f", $this>getLikelihoodRatio(1)); ?></td> </tr> <tr> <td>Likelihood Ratio(Test)</td> <td><?php printf("%01.2f", $this>getLikelihoodRatio(0)); ?></td> </tr> <tr> <td>Accuracy</td> <td><?php printf("%01.2f", $this>getAccuracy()); ?></td> </tr> <tr> <td>Gain</td> <td><?php printf("%01.2f", $this>getGain()); ?></td> </tr> </table> <?php }
} ?>

Permalink 
Introduction to Bayesian Angel Investing 
[Bayesian Inference] 
Posted on April 3, 2013 @ 05:47:00 AM by Paul Meagher
In Bayesian Angel Investing, you calculate the prior and posterior probability of an investment outcome to arrive a good decisions regarding those investments.
Let us see how it might work in the context of making a decision to invest in a startup company.
When an investor encounters an opportunity to invest in a startup company their goal is likely not to make an investment decision right away, but rather a decision on whether it is worth allocating time to pursue the opportunity further.
So, if a proposal meets the investor's checklist of positive attributes:
+ good management
+ good idea
+ good business plan
+ good deal
This might get the Bayesian Investor sufficiently motivated to start calculating the prior probability that the startup company might be worth investing in.
So if you assign a prior probability of 60% that the company might be worth investing in, you will need more information to move the probability upwards in order to finalize any deal.
You will want to meet via email, phone, and possibly in person to further discuss the proposal.
A Bayesian Investor can move towards a final decision by setting a decision making threshold of, say, 80% on the prior probability estimate (e.g., that the company will be successful S or not ~S). If the prior probability estimate of the startup being successful reaches or exceeds 80%, then invest in the company. If further information causes the prior probability to go below 50%, then don't invest. Prior estimates beget posterior estimates which become the priors in the next round of due diligence.
The way a Bayesian Investor moves towards making an investment decision is by gathering more information about the company. The information that is gathered should be diagnostic of whether the company is likely to succeed. Similar to the way a medical doctor orders test to either confirm or disconfirm an hypothesis related to the prior hypothesis (e.g., diagnostic possibilities  has cancer, does not have cancer).
We will try to formalize Bayesian investing more in a later blog post using this formula, p(HE) = p(H∩E) / p(E), as our starting point (where H stands for Hypothesis and E for Evidence).

Permalink 


