In my overview of Scholarly Inquiry Optimization (SIO), I claimed the future of scholarship lies not merely in Open Access publishing but in fitting research methodologies to the new cyber environment. I outlined several aspects of SIO that I would be covering. This post focuses on personally configured discovery in the research process.
By "personally configured" discovery, I am not referring to how you set parameters for your research, but to how intelligent research tools will be configuring themselves according to available data about you and your work. Through SIO standards, protocols can be established to verify that any given researcher has appropriately exposed sufficient and appropriate personal data to enable these emerging tools to do their work.
Before balking at the prospect of exposing personal data online, consider how routine this is within conventional scholarly inquiry. A grant proposal requires detailed information about the people who will conduct the research. Scholars submit their curriculum vitae and letters of recommendation to give funders a way to qualify them. Funders make judgments not only about the worthiness of the proposed research, but about the qualifications of the researchers and their fit with the proposed project.
In the future, the information about scholars needed by funding agencies will come more by way of intelligent harvesting of available personal data online than it will through an application form. Just as employers are discovering that Googling a prospective employee reveals relevant information not disclosed on a resume, so funding agencies will consult available public data about scholars seeking grants. SIO includes intelligent self representation online. How one's intellectual record is manifest online could make all the difference between receiving funding or not.
This sounds as if I’m talking about reputation management. That is only partly true. A scholar's web presence certainly includes the extent to which conventional markers of reputation are evident (degrees earned, publications, academic posts, etc.). And there are new, digital reputation systems evolving, too. However, I am speaking more in terms of the utility of personal data for the semantic web (which is a separate question from the utility of personal data for the social web).
Scholarly Inquiry Optimization requires scholars to articulate their interests and identity with the semantic web. The semantic web is the machine-readable web. And just as data is being structured with metadata so that machines can manipulate that data more intelligently, so scholars (along with their institutions, organizations, and publications) must provide for appropriate structuring of the data representing scholars and their research. Just as a computer cannot “understand” the content of a PDF document, the computers which do not “know” their users will not be able to help them as intelligently as they might.
A primitive way by which machines understand their users is via authentication. If you log into the university's intranet as a student, you get a page configured to your privileges and needs; if as a faculty member, you are taken to a page configured to that role. Now take this idea further. What if the library catalog “knew” not your role, but your interests? It already does, if you’ve established a history of checking out books. This is dark knowledge, however, since no library catalog I know of is using the accumulated data about patron check outs to structure more intelligent searches or provide recommendations. But that will come.
Or consider all that email you’ve written. If you are a Gmail user, you know that Google parses your letters and serves up advertisements corresponding to the content of your mail. When I think of all the names of professors, all the bibliography, all the references to conferences or to a host of academic subjects that have filled the gigabytes of email I’ve sent out over the years, it is obvious to me that a machine could figure out a whole lot about me and then provide meaningful suggestions to me about subjects, people, events, or scholarly projects that I’d find compelling. Of course there are privacy concerns, and it may be that we don’t want automated scholarly profiling to follow a subject we’ve distanced ourselves from for whatever reason. But those objections can be answered either with software tweaks or by the clear benefits of opening up personal information to be analyzed by machines.
Those benefits are obvious to those actively using Amazon or Netflix. Both these commercial services have sophisticated recommendation systems based partly on social parameters ("Those who bought Isaac Asimov also liked Arthur C. Clarke's works"), but also based on purchase or rental history, plus your willingness to teach the system just what it is that you like. On Netflix you can go through a series of movies, rating them from one to five stars. After enough of these, Netflix starts to suggest films that you will probably like based on those preferences. There is something exciting and even a bit magical about being presented personal recommendations via iTunes or Amazon or Netflix. Why? Because it works. I have discovered many books and media items about which I would otherwise never have known through these systems, even though such systems are clearly rudimentary right now. It follows that for scholarly inquiry to be optimized, it will requrie the adoption and adaptation of systems like these, including "customers" willing to teach the machine what interests them. The SIO scholar will be one that provides plenty of such data, and in the most meaningful ways.
What will those ways be? How does a researcher provide that public data that will end up proving most meaningful to his or her research? Well, obviously quarantining scholarly publications behind restricted-access firewalls is not going to help. No scholar will be optimized for the digital age who does not fully expose his or peer-reviewed publications. Open Access is a given. But there is more to it than that. Through blogs (like this one), both humans and machines can learn about a scholar’s interests. But scholarly processes are vital sources of information, too. Contrary to the print paradigm mode of perfecting knowledge before presenting it publicly, in the new paradigm we scholars narrate our research as we go: wikis, microblogging, open laboratories, etc. reveal both to humans and machines what our developing interests are. Traditional scholars are troubled by this. But openness about the inquiry process may in fact be the most valuable data one can generate as a scholar-–sometimes even more than the final scholarly product at which we were aiming. Open inquiry not only promotes socially aided research (which I'll post about later); it lets machines find your work and aid you in the development phases of research (the topic of my next post).
We are going to see the emergence of tools to assist scholars in better representing themselves. In the meantime, the optimized scholar of today needs to adjust to the mindset in which his or her activities (both research projects and more casual or creative adventures) are worth publicizing through the various means now available online. It will be those scholars who are lurking or searching (but never exposing their finished works, their works in progress, their interests or lives) who will find the machines far less accommodating for their research in the future. If you don’t talk to the machine, it just isn’t going to talk back.
Next up in this series on SIO: Contextual and Passive Search
Comments
You can follow this conversation by subscribing to the comment feed for this post.