NeitherRuleNorBeRuled on January 12, 2011, 02:37:18 pm

As compared to the real world, which has millions of tests.  They are not "controlled", but the sheer order of magnitude of the tests, combined with orthogonal nature of the random differences among them, yields the desired results.  It's just a matter of analyzing the data available.

Quote
They are not uncorrelated, and analysing that data looks like a hell of a task. But it's worth a try. It's worth a considerable number of PhD dissertations.

This is a typical data mining problem, and most of the work can be automated.  If someone can get a dissertation out of it I would hold that value of a PhD has diminished greatly. 

Quote
[M]y point ... is that research is a tool that sometimes provides new results, and not particularly an end in itself. We are pretty good at making the right amount of some tools, like hammers and dichloroethane. Finding the right amount of research is harder. Terry tried a moral approach -- every individual consumer should fund as much research as he wants, and that's the amount of research we should have because that's how much people actually want. Never mind how much gives us the most success, this is the morally-correct approach.

That which is novel is fairly common outside of general scientific research.  It is subject to feedback loops just as other economic event are; the feedback is expressed in the rate and quality of successful results.  Some individuals will be more successful than others; although there is randomness (AKA "luck") involved, it is only one factor. 


Quote
Meanwhile, basic research is something else. Find a better way to understand the world and everybody who depended on the old way of thinking might -- might -- find improvements based on the new understanding. What should the market pay for that? They won't find out how useful the new ideas are until they try them and adapt them for themselves. Most attempts to discover better understandings fail....

The "might" qualifier is largely based on the area in which the research is being done.  If it is an area in which a better understanding is seen to be of value, the likelihood of utility is much greater than if it is an area in which a better understanding is seen as having a lesser value.  The buyers in the market will provide information on what areas a better understanding would be seen to have greater value.

Quote
Would you expect that the feedback loops that determine the amount of innovation, would work as well as the feedback loops that decide how many hammers to make?

You are confusing things by referring to the "amount" of innovation.  What do you consider a unit of innovation?  I'm not sure that such a unit can be usefully defined, since this would force  a total ordering of innovations.

This is even more complicated when one thinks in terms of "amount of research", since there is a third attribute being the success of the research.  The same amount of research (measured in general  terms such as GAU spent or hours expended by those of similar expertise) in two different fields, or even two different approaches in the same field could yield dramatically different results.

J Thomas on January 12, 2011, 02:59:54 pm

As compared to the real world, which has millions of tests.  They are not "controlled", but the sheer order of magnitude of the tests, combined with orthogonal nature of the random differences among them, yields the desired results.  It's just a matter of analyzing the data available.

They are not uncorrelated, and analysing that data looks like a hell of a task. But it's worth a try. It's worth a considerable number of PhD dissertations.

This is a typical data mining problem, and most of the work can be automated.  If someone can get a dissertation out of it I would hold that value of a PhD has diminished greatly. 

We must be thinking about very different things. What I'm thinking about is to find actual research budgets (not the fudged figures that the Federal Government receives for tax purposes) and somehow measure the results in terms of increased sales. This requires a great deal of creativity. Probably it requires a lot of skill to hack into the necessary files, and a tremendous amount of creativity in interpreting the data.

Incidentally, did you know that professional mapmakers add various dead-end streets which are not there? It doesn't matter to customers who want to get to places which are there. But if somebody copies their maps and tries to sell those maps as their own work, the copied wrong data is proof that they copied their map and did not create it fresh.

Somewhat similarly, when a mass mailer buys a mailing list, a few of the addresses belong to the seller. Every time the mailer uses the list, they send their mailings to the seller too. So if the mailer copies the list to somebody else who uses it, they will send proof that they used it to the seller, who does not publish that address anywhere else.

Do you know how much this sort of thing generalizes?

Quote
Quote
[M]y point ... is that research is a tool that sometimes provides new results, and not particularly an end in itself. We are pretty good at making the right amount of some tools, like hammers and dichloroethane. Finding the right amount of research is harder. Terry tried a moral approach -- every individual consumer should fund as much research as he wants, and that's the amount of research we should have because that's how much people actually want. Never mind how much gives us the most success, this is the morally-correct approach.

That which is novel is fairly common outside of general scientific research.  It is subject to feedback loops just as other economic event are; the feedback is expressed in the rate and quality of successful results.  Some individuals will be more successful than others; although there is randomness (AKA "luck") involved, it is only one factor.

When you measure the result of attempting to make and sell hammers, you get a clear answer. You know how many hammers got sold last month versus the same month a year earlier. When you measure the result of innovation you have a more complex task.

Quote
Quote
Meanwhile, basic research is something else. Find a better way to understand the world and everybody who depended on the old way of thinking might -- might -- find improvements based on the new understanding. What should the market pay for that? They won't find out how useful the new ideas are until they try them and adapt them for themselves. Most attempts to discover better understandings fail....

The "might" qualifier is largely based on the area in which the research is being done.  If it is an area in which a better understanding is seen to be of value, the likelihood of utility is much greater than if it is an area in which a better understanding is seen as having a lesser value.  The buyers in the market will provide information on what areas a better understanding would be seen to have greater value.

That depends very largely on how well the new knowledge is marketed.

Quote
Quote
Would you expect that the feedback loops that determine the amount of innovation, would work as well as the feedback loops that decide how many hammers to make?

You are confusing things by referring to the "amount" of innovation.  What do you consider a unit of innovation?  I'm not sure that such a unit can be usefully defined, since this would force  a total ordering of innovations.

This is even more complicated when one thinks in terms of "amount of research", since there is a third attribute being the success of the research.  The same amount of research (measured in general  terms such as GAU spent or hours expended by those of similar expertise) in two different fields, or even two different approaches in the same field could yield dramatically different results.

So does it look like a simple data-mining effort? I have the impression you were talking about something else entirely.

mellyrn on January 12, 2011, 03:05:41 pm
Quote
Minor nit, I know, but J Thomas is not saying that it is unknowable; rather, he is saying he does not know how to discover it.

Fair enough; let it read "unknowable for all practical purposes". 

J Thomas on January 12, 2011, 04:22:21 pm
Quote
Minor nit, I know, but J Thomas is not saying that it is unknowable; rather, he is saying he does not know how to discover it.

Fair enough; let it read "unknowable for all practical purposes". 

Perhaps the right kind of research would discover this. I don't see how, but if I understood it all ahead of time then we wouldn't need research. Whenever they had a question people could just ask me.

NeitherRuleNorBeRuled on January 12, 2011, 08:10:32 pm
We must be thinking about very different things. What I'm thinking about is to find actual research budgets (not the fudged figures that the Federal Government receives for tax purposes) and somehow measure the results in terms of increased sales. This requires a great deal of creativity. Probably it requires a lot of skill to hack into the necessary files, and a tremendous amount of creativity in interpreting the data.

So you propose doing this via "crackers"?   :)

Unless there is reason to think that there is some significant  difference between the real-fudged values at different funding levels, the fudged numbers should be significant.

Of course, there would be an additional problem in determining what is and is not "research".  A trivial example would be two groups, one includes lab maintenance costs in research and another has that in a separate cost center.

Personally, I wouldn't worry about the amount of money that goes into it, but rather the quantity of research output. 

Quote
Incidentally, did you know that professional mapmakers add various dead-end streets which are not there? It doesn't matter to customers who want to get to places which are there. But if somebody copies their maps and tries to sell those maps as their own work, the copied wrong data is proof that they copied their map and did not create it fresh.

Somewhat similarly, when a mass mailer buys a mailing list, a few of the addresses belong to the seller. Every time the mailer uses the list, they send their mailings to the seller too. So if the mailer copies the list to somebody else who uses it, they will send proof that they used it to the seller, who does not publish that address anywhere else.

Do you know how much this sort of thing generalizes?
Quote

Interesting digression, and I was familiar with both those examples.  It's a matter of adding known quantities to an unknown set to estimate, once "processed" what had actually been done.  A couple of techniques like this have been tried in Software Engineering:

  • Bebugging -- adding bugs to source code to measure the effectiveness of a code review.  It turns out it is too expensive in practice, since adding bugs at appropriate levels and complexity takes nearly as much work as writing the software itself.
  • Fault Insertion -- adding code to optionally generate internal error conditions to ensure that error handling code is tested and processes correctly.   This technique is actually useful in practice.  The cognate technique is also used in hardware; sometimes circuit boards have solder pads that are used to place probes for injecting unexpected faults.
Quote
When you measure the result of attempting to make and sell hammers, you get a clear answer. You know how many hammers got sold last month versus the same month a year earlier. When you measure the result of innovation you have a more complex task.

You get a precise answer for hammers; that may or may not be an accurate indicator of anything, however.  With innovation you will get a less precise answer, but that may be just as accurate.

Quote
That depends very largely on how well the new knowledge is marketed.

Same as anything else.

Quote
So does it look like a simple data-mining effort? I have the impression you were talking about something else entirely.

The data mining is relatively trivial.  Getting good data may not be possible (approximate data would be easier, but less satisfying), and I strongly suspect that once that was done the correlations you would like to see would not be there.

This all goes to my contention that the only legitimate definition of optimal spending is that which individuals, free from coercion make.  It is utterly pragmatic, not to mention, "moral".   No other definition even hinted at in this conversation can be considered well-defined.  You could run your 10K world scenario, and the ways that given funding levels were spent could dramatically (factors of 10, 100, 1000) differ in the results.

If you want more resources applied to research, donate them.  Period.

J Thomas on January 13, 2011, 11:04:43 am

    Incidentally, did you know that professional mapmakers add various dead-end streets which are not there? It doesn't matter to customers who want to get to places which are there. But if somebody copies their maps and tries to sell those maps as their own work, the copied wrong data is proof that they copied their map and did not create it fresh.

    Somewhat similarly, when a mass mailer buys a mailing list, a few of the addresses belong to the seller. Every time the mailer uses the list, they send their mailings to the seller too. So if the mailer copies the list to somebody else who uses it, they will send proof that they used it to the seller, who does not publish that address anywhere else.

    Do you know how much this sort of thing generalizes?

    Interesting digression, and I was familiar with both those examples.  It's a matter of adding known quantities to an unknown set to estimate, once "processed" what had actually been done.  A couple of techniques like this have been tried in Software Engineering:

    • Bebugging -- adding bugs to source code to measure the effectiveness of a code review.  It turns out it is too expensive in practice, since adding bugs at appropriate levels and complexity takes nearly as much work as writing the software itself.

    Yes. It's easy to insert random errors, but many of those are easy to catch. Coming up with errors that are as hard to catch as the ones that haven't been caught is much harder.

    Anyway, here's my speculative suspicion. In both cases the seller of the information adds false information which will not particularly inconvenience the buyer. When you use a map you don't care about extra dead-end streets because you use the map to get to places which are real. You don't care about a few duds in your mailing list when you'll feel it's unusually successful if it gets a 2% response rate.

    How much does that generalize? If people who sell information often include false information which lets them monitor how it's used, but which does not interfere with the buyer's stated purpose....

    And later someone tries to use the data for some other purpose entirely, wanting to data-mine it. Now there is no guarantee at all that the intentional false data will not interfere with the new purpose.

    Of course the false data is likely to be outliers, so if you discard the outliers anyway it won't cause any trouble.  hehehe.

    And it would be only a small fraction of the total.

    I just got a big picture of businesses selling each other data, and in the extreme case all the data which has gone through the market is intentionally falsified. And then they are all doing data-mining of various sorts, when to get good results they need to go back to the source and buy new datasets that are falsified in ways that should not interfere with their new purpose. (But if you have two versions of the same data that are falsified in different ways, can you filter some of it out? Would they agree to do that?)

    Data which does not have to be sold has a better chance to be correct? But not government data -- people lie to the government whenever they think the truth might get them in trouble.
    [/list]
    « Last Edit: January 13, 2011, 11:09:26 am by J Thomas »

     

    anything