<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Bayesian inference of multinomial distribution parameters</title>
	<atom:link href="http://data-sorcery.org/2009/07/01/bayes-multinomial/feed/" rel="self" type="application/rss+xml" />
	<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/</link>
	<description></description>
	<lastBuildDate>Sat, 05 May 2012 13:46:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: incanter</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-73</link>
		<dc:creator><![CDATA[incanter]]></dc:creator>
		<pubDate>Wed, 23 Sep 2009 14:02:42 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-73</guid>
		<description><![CDATA[I like it, seems like a reasonable approach.]]></description>
		<content:encoded><![CDATA[<p>I like it, seems like a reasonable approach.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-72</link>
		<dc:creator><![CDATA[Dave]]></dc:creator>
		<pubDate>Wed, 23 Sep 2009 12:25:43 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-72</guid>
		<description><![CDATA[May have not explained well because the previous iteration provides update info for the next iteration.  Assume three categories and we start with 4 in cat1, 3 in cat2, and 3 in cat3 (y=(4,3,3)).  We assume we know nothing about the distro so we set alpha=(1,1,1).  We generate 10k dirichlet and calc the means (.38, .31, .31).  We now use these numbers as alpha and collect another sample such that y=(4,4,3).  We generate 10k dirichlet again with updated y and alpha and get new means (.36,.36,.28).  We now use these means as alpha and collect another sample such that y=(5,4,3).  All the while, we are checking simultaneous confidence intervals for each multinomial param.  When they are all within some certainty threshold, we stop and look at how many samples we have collected.  This is our minimum sample size.  May be missing the boat here, but this is my interpretation of bayesian updating.  Let me know if I am way off course here.]]></description>
		<content:encoded><![CDATA[<p>May have not explained well because the previous iteration provides update info for the next iteration.  Assume three categories and we start with 4 in cat1, 3 in cat2, and 3 in cat3 (y=(4,3,3)).  We assume we know nothing about the distro so we set alpha=(1,1,1).  We generate 10k dirichlet and calc the means (.38, .31, .31).  We now use these numbers as alpha and collect another sample such that y=(4,4,3).  We generate 10k dirichlet again with updated y and alpha and get new means (.36,.36,.28).  We now use these means as alpha and collect another sample such that y=(5,4,3).  All the while, we are checking simultaneous confidence intervals for each multinomial param.  When they are all within some certainty threshold, we stop and look at how many samples we have collected.  This is our minimum sample size.  May be missing the boat here, but this is my interpretation of bayesian updating.  Let me know if I am way off course here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: incanter</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-71</link>
		<dc:creator><![CDATA[incanter]]></dc:creator>
		<pubDate>Tue, 22 Sep 2009 22:04:42 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-71</guid>
		<description><![CDATA[That should work, although I&#039;m not sure you gain much by using the estimated means from the previous iteration, they should be stationary.]]></description>
		<content:encoded><![CDATA[<p>That should work, although I&#8217;m not sure you gain much by using the estimated means from the previous iteration, they should be stationary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-70</link>
		<dc:creator><![CDATA[Dave]]></dc:creator>
		<pubDate>Tue, 22 Sep 2009 21:54:21 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-70</guid>
		<description><![CDATA[Alright, so would like some feedback on an approach to this.  What I propose is more incremental and Bayesian in nature.  Assume you only have 3 categories and start off with prior dirichlet params of 1,1,1 (similar to the last part of above).  You then wait until you have data represented in all 3 categories (lets say this is n=10 and is distributed as 4,3,3).  You then calculate the posterior as above (def  props (sample-dirichlet 1000 (plus y alpha))with y=(4,3,3) and alpha=(1,1,1).  You can then calculate means, sd, and CI for each param.  If CI is within a given threshold, you are done (n is the sample size).  If not, you increment n and take your next sample.  You use the calculated means as your new dirichlet params and recalculate the posterior with updated y (i.e. 5,3,3).  Does this at all make sense?  Thanks]]></description>
		<content:encoded><![CDATA[<p>Alright, so would like some feedback on an approach to this.  What I propose is more incremental and Bayesian in nature.  Assume you only have 3 categories and start off with prior dirichlet params of 1,1,1 (similar to the last part of above).  You then wait until you have data represented in all 3 categories (lets say this is n=10 and is distributed as 4,3,3).  You then calculate the posterior as above (def  props (sample-dirichlet 1000 (plus y alpha))with y=(4,3,3) and alpha=(1,1,1).  You can then calculate means, sd, and CI for each param.  If CI is within a given threshold, you are done (n is the sample size).  If not, you increment n and take your next sample.  You use the calculated means as your new dirichlet params and recalculate the posterior with updated y (i.e. 5,3,3).  Does this at all make sense?  Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-69</link>
		<dc:creator><![CDATA[Dave]]></dc:creator>
		<pubDate>Tue, 22 Sep 2009 17:33:55 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-69</guid>
		<description><![CDATA[This approach does not seem to take into account the previous results.  I.e, I start off and guess my initial params and simulate.  I should then simulate again using updated params from the simulation I just ran.  Somehow during all of this, I must also vary n.  Just not sure how to approach from a purely Bayesian standpoint.  Thanks for any/all help.]]></description>
		<content:encoded><![CDATA[<p>This approach does not seem to take into account the previous results.  I.e, I start off and guess my initial params and simulate.  I should then simulate again using updated params from the simulation I just ran.  Somehow during all of this, I must also vary n.  Just not sure how to approach from a purely Bayesian standpoint.  Thanks for any/all help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: incanter</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-68</link>
		<dc:creator><![CDATA[incanter]]></dc:creator>
		<pubDate>Fri, 18 Sep 2009 17:52:11 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-68</guid>
		<description><![CDATA[An alternative approach is to perform multiple simulations with different sample sizes (e.g. n1, n2, ...) using the sample-multinomial-params function. 

(def counts1 (mult n1 [p1 p2 p3]))
(def counts2 (mult n2 [p1 p2 p3]))
...

where p1, p2, p3 are the proportion parameters of your simulated multinomial distribution.

Now generate a sample of 1000 multinomial-parameters based on each of the different values of n:

(def params1 (sample-multinomial-params 1000 counts1))
(def params2 (sample-multinomial-params 1000 counts2))

Finally, plot the histograms for the simulated parameters and use the quantile function (just like in this post) to find the simulated-sample based on the smallest n with an acceptable confidence interval.

David]]></description>
		<content:encoded><![CDATA[<p>An alternative approach is to perform multiple simulations with different sample sizes (e.g. n1, n2, &#8230;) using the sample-multinomial-params function. </p>
<p>(def counts1 (mult n1 [p1 p2 p3]))<br />
(def counts2 (mult n2 [p1 p2 p3]))<br />
&#8230;</p>
<p>where p1, p2, p3 are the proportion parameters of your simulated multinomial distribution.</p>
<p>Now generate a sample of 1000 multinomial-parameters based on each of the different values of n:</p>
<p>(def params1 (sample-multinomial-params 1000 counts1))<br />
(def params2 (sample-multinomial-params 1000 counts2))</p>
<p>Finally, plot the histograms for the simulated parameters and use the quantile function (just like in this post) to find the simulated-sample based on the smallest n with an acceptable confidence interval.</p>
<p>David</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-67</link>
		<dc:creator><![CDATA[Dave]]></dc:creator>
		<pubDate>Fri, 18 Sep 2009 16:14:05 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-67</guid>
		<description><![CDATA[Thanks for the quick response.  Yes, am familiar with that paper and it hits the nail on the head of what I am trying to do, just not how I want to do it.  Want to use Bayesian techniques (should have made that clear in my post) to come up with a sample size determination which is what led me to your post.  Believe what you are doing above is a starting point, just not sure how to take it to the next level.  Have read some papers on estimating multinomial params using Bayes (usually Dirichlet), just looking for a solid example/implementation to fill in the gaps for me.  Thanks.]]></description>
		<content:encoded><![CDATA[<p>Thanks for the quick response.  Yes, am familiar with that paper and it hits the nail on the head of what I am trying to do, just not how I want to do it.  Want to use Bayesian techniques (should have made that clear in my post) to come up with a sample size determination which is what led me to your post.  Believe what you are doing above is a starting point, just not sure how to take it to the next level.  Have read some papers on estimating multinomial params using Bayes (usually Dirichlet), just looking for a solid example/implementation to fill in the gaps for me.  Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: incanter</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-66</link>
		<dc:creator><![CDATA[incanter]]></dc:creator>
		<pubDate>Fri, 18 Sep 2009 13:52:52 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-66</guid>
		<description><![CDATA[Dave,

That is a great question, and one I should write an entire post about. 

In the meantime, the only thing I can offer is yet another paper. The whitepaper, &lt;a href=&quot;http://www.nawrs.org/ClevelandPDF/chakra2.PDF&quot; rel=&quot;nofollow&quot;&gt;&quot;Sample Size Determination for Multinomial Population&quot;&lt;/a&gt; (pdf) by Subinoy Chakravarty is a nice summary of an approach with the associated references. 

Even better, it has a nice table (table 1 on pg 2) for finding the sample size (n) for a given significance-level (alpha) and half-width of the confidence interval around the estimated proportions (d). For instance, to estimate the proportion parameters of a multinomial distribution within +/- 0.02 at a significance-level of 0.05, the minimum sample size, based on table 1, would be 3184.

I hope to create some power calculation functions for Incanter, and then write some posts on them, in the future.

David]]></description>
		<content:encoded><![CDATA[<p>Dave,</p>
<p>That is a great question, and one I should write an entire post about. </p>
<p>In the meantime, the only thing I can offer is yet another paper. The whitepaper, <a href="http://www.nawrs.org/ClevelandPDF/chakra2.PDF" rel="nofollow">&#8220;Sample Size Determination for Multinomial Population&#8221;</a> (pdf) by Subinoy Chakravarty is a nice summary of an approach with the associated references. </p>
<p>Even better, it has a nice table (table 1 on pg 2) for finding the sample size (n) for a given significance-level (alpha) and half-width of the confidence interval around the estimated proportions (d). For instance, to estimate the proportion parameters of a multinomial distribution within +/- 0.02 at a significance-level of 0.05, the minimum sample size, based on table 1, would be 3184.</p>
<p>I hope to create some power calculation functions for Incanter, and then write some posts on them, in the future.</p>
<p>David</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-65</link>
		<dc:creator><![CDATA[Dave]]></dc:creator>
		<pubDate>Fri, 18 Sep 2009 11:56:40 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-65</guid>
		<description><![CDATA[Was wondering how you could determine the minimum sample size required for the above example (i.e. the minimum amount of data needed for accurate prediction).  Have read some papers on this, but am unsure how you would implement in R or Clojure.  Thanks]]></description>
		<content:encoded><![CDATA[<p>Was wondering how you could determine the minimum sample size required for the above example (i.e. the minimum amount of data needed for accurate prediction).  Have read some papers on this, but am unsure how you would implement in R or Clojure.  Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: incanter</title>
		<link>http://data-sorcery.org/2009/07/01/bayes-multinomial/#comment-48</link>
		<dc:creator><![CDATA[incanter]]></dc:creator>
		<pubDate>Wed, 12 Aug 2009 21:01:50 +0000</pubDate>
		<guid isPermaLink="false">http://incanter.wordpress.com/?p=474#comment-48</guid>
		<description><![CDATA[Ah, if you want to use a Bayesian method to model the data, check out BEAM (Bayesian Estimation of Array Measurements).

Here&#039; the original paper:

http://cbcl.mit.edu/projects/cbcl/publications/ps/denoising_recomb.pdf

And here&#039;s the BEAM website, which as links to a web-based version of the program so you can upload your data directly and have it return the results.

http://people.csail.mit.edu/rondror/BEAM/]]></description>
		<content:encoded><![CDATA[<p>Ah, if you want to use a Bayesian method to model the data, check out BEAM (Bayesian Estimation of Array Measurements).</p>
<p>Here&#8217; the original paper:</p>
<p><a href="http://cbcl.mit.edu/projects/cbcl/publications/ps/denoising_recomb.pdf" rel="nofollow">http://cbcl.mit.edu/projects/cbcl/publications/ps/denoising_recomb.pdf</a></p>
<p>And here&#8217;s the BEAM website, which as links to a web-based version of the program so you can upload your data directly and have it return the results.</p>
<p><a href="http://people.csail.mit.edu/rondror/BEAM/" rel="nofollow">http://people.csail.mit.edu/rondror/BEAM/</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

