I'm a mathematician trying to test some things on geneexpression data, and I'm thus skimming over various articles suchas Sotiriou et. al. to understand what is typically done with suchdata sets. Several things confuse me; in particular, a paragraph inSotiriou et. al. reads:
"Clinical parameters such as ER status, [...] affect thebehavior of breast cancers. We asked whether theseclinical/pathologic characteristics were associated withdifferential gene expression. Parametric t tests identified 606probe elements of 7,650 elements represented in our array thatcould segregate ER+ and ER- breast tumors (P < 0.001)."
As segregation of ER+/- based on gene expressions is one ofseveral things I'm interested in attempting to achieve throughnovel methods, I have been trying to understand what precisely ismeant with the above paragrah. To recap the article, there are 99patients with 7,650 probe expression values, and one ER+/- valueeach. The article sets out to determine which of those 7,650 probessuccessfully segregate the dataset into ER+ and ER-.
I've run the above paragraph by a nearby statistician, and hecould not for the life of him figure out what was done, and had noteven heard of such a thing as a "parametric t test". This leads meto suspect that the term is specific to biology, so I ask: what ismeant? It is also unclear to me (and him) what the P-value means inthis context.
I hope the scope of this question isn't too broad. Of course Iwant to avoid asking "explain this article to me, the outsider,please"; I do believe the paragraph above is relativelyself-contained in the context of gene expression.