{"id":454,"date":"2016-11-15T12:27:12","date_gmt":"2016-11-15T12:27:12","guid":{"rendered":"http:\/\/pop-gen.eu\/wordpress\/?page_id=454"},"modified":"2024-07-01T20:07:02","modified_gmt":"2024-07-01T20:07:02","slug":"statistics-in-bioinformatics-using-r","status":"publish","type":"page","link":"https:\/\/pop-gen.eu\/wordpress\/statistics-in-bioinformatics-using-r","title":{"rendered":"Statistics in Bioinformatics using R"},"content":{"rendered":"<h1>Installing R<\/h1>\n<p>R is a free and open source programming language. You can find download instructions here:<\/p>\n<p><a href=\"https:\/\/www.r-project.org\/\">https:\/\/www.r-project.org\/<\/a><\/p>\n<ul>\n<li>Try to install it in your personal computer<\/li>\n<\/ul>\n<p>Note: If you are using Linux (some debian flavor) then installing R is really simple:<\/p>\n<pre>sudo apt-get install r-base r-base-core<\/pre>\n<h1>Using R for the course<\/h1>\n<ul>\n<li>If R is installed in the University Computers, you can use R there.<\/li>\n<li>You can also login to:<br \/>\n139.91.162.50:8787<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1>Statistics for Bioinformatics<\/h1>\n<p><img decoding=\"async\" src=\"http:\/\/www.azquotes.com\/picture-quotes\/quote-the-only-way-to-learn-mathematics-is-to-do-mathematics-paul-halmos-60-1-0142.jpg\" alt=\"\" \/><\/p>\n<p>So, just read and try&#8230;. it&#8217;s the best way to learn applying statistics in bioinformatics problems.<\/p>\n<p>&nbsp;<\/p>\n<h1>Why Statistics in Biology<\/h1>\n<ul>\n<li>Many students go to Biology to avoid anything related to math.<\/li>\n<li>Why do you have now to learn statistics and apply them to biological data?<\/li>\n<\/ul>\n<h1>Statistics is the Science of Learning from Data<\/h1>\n<p>You use a small collection (a sample) to learn something for the whole (the population).<\/p>\n<p>Sample (Wikipedia):<br \/>\nIn <a title=\"Statistics\" href=\"https:\/\/en.wikipedia.org\/wiki\/Statistics\">statistics<\/a> and <a title=\"Quantitative research\" href=\"https:\/\/en.wikipedia.org\/wiki\/Quantitative_research\">quantitative research<\/a> methodology, a <b>data sample<\/b> is a set of <a title=\"Data\" href=\"https:\/\/en.wikipedia.org\/wiki\/Data\">data<\/a> collected and\/or selected from a <a title=\"Statistical population\" href=\"https:\/\/en.wikipedia.org\/wiki\/Statistical_population\">statistical population<\/a> by a defined procedure.<sup id=\"cite_ref-1\" class=\"reference\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Sample_%28statistics%29#cite_note-1\">[1]<\/a><\/sup><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"thumbimage\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/bf\/Simple_random_sampling.PNG\/300px-Simple_random_sampling.PNG\" srcset=\"\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/bf\/Simple_random_sampling.PNG\/450px-Simple_random_sampling.PNG 1.5x, \/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/b\/bf\/Simple_random_sampling.PNG\/600px-Simple_random_sampling.PNG 2x\" alt=\"\" width=\"300\" height=\"231\" data-file-width=\"618\" data-file-height=\"476\" \/><\/p>\n<p>A sample can be <strong>random<\/strong>: The chance of sampling an object does not depend on the properties of the object.<\/p>\n<ul>\n<li>The sample is taken from a population.<\/li>\n<li>The population can be assumed something large, perhaps infinite, and unknown.<\/li>\n<li>We are interested in understanding parameters from the population.<\/li>\n<li>We study the sample in order to understand the population.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1>Let&#8217;s start with real Examples: Diet-Induced Obesity in Mice<\/h1>\n<p><a href=\"http:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-05-09.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-460 aligncenter\" src=\"http:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-05-09-300x127.png\" alt=\"screenshot-from-2016-11-14-11-05-09\" width=\"760\" height=\"322\" srcset=\"https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-05-09-300x127.png 300w, https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-05-09-768x324.png 768w, https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-05-09-1024x433.png 1024w, https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-05-09.png 1910w\" sizes=\"auto, (max-width: 706px) 89vw, (max-width: 767px) 82vw, 740px\" \/><\/a><\/p>\n<p>You can click on the dataset. Then, go to the bottom of the page:<br \/>\n<strong>Experiment design and value distribution<\/strong><\/p>\n<p>Click on the experimental design. You will see something similar to that:<\/p>\n<h1><a href=\"http:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-24-48.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-462 aligncenter\" src=\"http:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-24-48-300x102.png\" alt=\"screenshot-from-2016-11-14-11-24-48\" width=\"850\" height=\"289\" srcset=\"https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-24-48-300x102.png 300w, https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-24-48-768x261.png 768w, https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-24-48-1024x347.png 1024w, https:\/\/pop-gen.eu\/wordpress\/wp-content\/uploads\/2016\/11\/Screenshot-from-2016-11-14-11-24-48.png 1904w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/a><\/h1>\n<p>This represents the design of the study. Let&#8217;s try to understand it:<\/p>\n<ul>\n<li>3 protocols: baseline, normal diet, high fat dies<\/li>\n<li>several time-points (after birth)<\/li>\n<\/ul>\n<p><strong>Notes<\/strong><\/p>\n<ul>\n<li>When we compare the diets, the baseline can be neglected (there is no diet there)<\/li>\n<li>Expression values of the same at different samples are different due to:\n<ul>\n<li>different time-point<\/li>\n<li>different treatment (diet)<\/li>\n<li>stochasticity<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1>Understanding our data<\/h1>\n<h2>Descriptive quantities for the gene expression values<\/h2>\n<p>In R, it is very easy to calculate descriptive statistics:<\/p>\n<ul>\n<li>mean, variance, standard deviation, percentiles<\/li>\n<\/ul>\n<p>It is also very easy to summarize the data using:<\/p>\n<ul>\n<li>boxplots, histograms, densities<\/li>\n<\/ul>\n<p>Visualization<\/p>\n<p>With R it is easy to visualize the data:<\/p>\n<ul>\n<li>Principal Component Analysis<\/li>\n<li>Multidimensional Scaling<\/li>\n<li>Heatmaps<\/li>\n<\/ul>\n<h1>Hypothesis testing<\/h1>\n<p>Now, it&#8217;s time to perform some hypothesis testing:<\/p>\n<p>For example,<\/p>\n<ul>\n<li>geneA is higher in high-fat diet mice, than normal fat diet mice<\/li>\n<\/ul>\n<p>In general, to perform a hypothesis testing we need:<\/p>\n<ul>\n<li>A hypothesis to test, that <strong>is formed prior to seeing the data<\/strong><\/li>\n<li>Data<\/li>\n<li>A statistic<\/li>\n<li>A theoretical distribution of the statistic under the null hypothesis<\/li>\n<li>A significance threshold<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Installing R R is a free and open source programming language. You can find download instructions here: https:\/\/www.r-project.org\/ Try to install it in your personal computer Note: If you are using Linux (some debian flavor) then installing R is really simple: sudo apt-get install r-base r-base-core Using R for the course If R is installed &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/pop-gen.eu\/wordpress\/statistics-in-bioinformatics-using-r\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Statistics in Bioinformatics using R&#8221;<\/span><\/a><\/p>\n","protected":false},"author":4,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-454","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/pages\/454","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/comments?post=454"}],"version-history":[{"count":8,"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/pages\/454\/revisions"}],"predecessor-version":[{"id":464,"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/pages\/454\/revisions\/464"}],"wp:attachment":[{"href":"https:\/\/pop-gen.eu\/wordpress\/wp-json\/wp\/v2\/media?parent=454"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}