Current Projects

Assessment of user-generated content by experts - We are currently working on a study that involves the assessment of quality from UGC sites such as question-answering sites, blogs, etc. by experts such as virtual reference librarians. In this study we obtain qualitative and quantitative assessment from the librarians to build a conceptual framework to evaluate the quality of content across platforms.

Content Quality Plugin - We are working on building a browser-based plugin which would automatically show the quality of content present on user-generated sites in real time. Currently, the plugin is functional on the question-answering site, Brainly which automatically identifies the quality of content and show them in the help of visualizations.

Context-Independent Content Evaluation - We are in the Internet age where “content” is everywhere, which we often use for decision making without knowing the context around which it was created. We will carry out this research to understand how we, as users of user-generated content, can assess the quality of content – knowing as well as not knowing the context. We will evaluate the quality of answers, with and without the presence of questions by Amazon Mechanical Turk Workers, based on certain criteria. We hope to report interesting results at the end of this study.

Question Quality in Educational Q&A - Community Question Answering (CQA) sites have heralded as the place for Web users to facilitate information exchange and interactions with peers to get crowdsourced answers to the questions asked by members present on the site. However, it has been seen quite frequently where these venues of interaction have transformed into "informal academic environment" or "knowledge hubs." In many scenarios, the CQA sites have been renamed and retailored as the habitats that provide online users a quintessential "virtual classroom" environment to satisfy their information needs. Therefore, there exists a compelling need to assess the content present in the CQA. The content found in the CQA needs to meet a high standard to provide sufficient and accurate content that could help in providing user-satisfaction. The presence of "bad" content not just leads to decrease in the user-traffic but also impedes in the learning process of the online learners. It is very critical to evaluate and tease out the reasons that make content "bad" in nature. By fixing the bad content, the CQA sites would provide a learning opportunity for the students to better phrase the questions to satisfy the information need of the users in the future. This problem can be mainly applied to the educational context which will guide interdisciplinary research that bridges Information Science, and Learning Sciences.

Learning in search:an exploratory field study - In this user study, we investigate information seeking as a learning process. We present the participants with learning-related search tasks, which has been designed to represent differently cognitive levels of learning. This research also examines how broadening the users' scope of information sources (e.g. web search, offline seeking, crowdsourcing) may influence their search behavior and learning outcomes. Through the analysis of both users' web log data and the field reports, we attempt to capture how the users may seek information both online and offline, as well as how the users may exhibit various behavioral patterns while completing each task according to the levels of cognitive complexity. This field study, conducted over a period of two weeks may reveal insights with regards to users' learning and information seeking processes.

Google Analytics Project - is a social learning community for students and educators that encourages collaborative learning and sharing of knowledge in different academic subject areas. It has both registered users and unregistered visitors. The online activities of all its users are logged in a cloud bigdata data-warehouse. Desalegn works with the Brainly team on the descriptive analytics of the online user behavior in relation to the different Community Question Answering (CQA) and website metrics such as subject areas, session duration, bounce rate, etc. After analyzing the user behavior he will be working on predictive analytics based on the historical data using machine learning and deep learning.

Past Projects

Question Difficulty - The primary objective of this research was to propose a method which could categorize questions based on their difficulty level in Community Question-Answering sites (CQA). The problem investigated in this research is significant as it also lends its applicability in categorizing questions based on grade-level. For this project, Long worked on Biology and Physics questions posted on the Educational Q&A site, Brainly. This research can be integrated with user-profile matching and question recommendation process in CQA sites.

Retrieving Rising Stars in Focused Community Question-Answering - In Community Question Answering (CQA) forums, there is typically a small fraction of users who provide high-quality posts and earn a very high reputation status from the community. These top contributors are critical to the community since they drive the development of the site and attract traffic from Internet users. Identifying these individuals could be highly valuable, but this is not an easy task. Unlike publication or social networks, most CQA sites lack information regarding peers, friends, or collaborators, which can be a major indicator signaling future success or performance. In this project, we attempt to perform this analysis by extracting different sets of features to predict future contribution.

Retrieving People: Identifying Potential Answerers in Community Question-Answering - Community Question-Answering (CQA) sites have become popular venues where people can ask questions, seek information or share knowledge with the community. While responses on CQA sites are obviously slower than information retrieval by a search engine, one of the most frustrating aspects for an asker is if the question posted does not receive a reasonable answer or remains unanswered. CQA sites could improve the user's experience by identifying potential answerers and routing appropriate questions to them. Finding potential answerers increases the chance that the question is answered or answered more quickly. In this project, we predict the potential answerers based on question content and user profiles. Our approach builds user profiles based on past activity. When a new question is posted, the proposed method computes scores between the question with all user profiles to find the potential answerers.

Investigating Features that Contribute to Increasing Question Quality in Social Q&A - The study examined what features contribute in creating a quality question within the SQA platform, Yahoo! Answers. Specifically, the study examined how textual features, such as answer length and TF/IDF frequency, contribute in determining question quality, using non-textual features derived using human assessors as a quality standard. In addition, if a question is of poor quality, we also asked why this is, by asking assessors to rate question quality along a series of factors, such as clarity and complexity, and mapping textual features to these factors. By taking this approach, derived an automated model for question quality that can not only predict question quality, but also provide reasons why a question is of poor quality, which would yield suggestions for how to improve the question.

Investigating Motivations and Expectations of Asking a Question in Social Q&A - Social Q&A (SQA) has rapidly grown in popularity, impacting people's information seeking behavior. Although research has paid much attention to a variety of characteristics within SQA to investigate how people seek and share information, fundamental questions of user motivations and expectations for information seeking behaviors in SQA remain: Why do people use SQA to ask a question? What are the users' expectations with respect to the responses to their questions? The current study applied the theoretical framework of uses and gratification to investigate the motivations for SQA use, and adapted criteria people employ to evaluate information from previous literature in order to investigate expectations with regard to evaluation of content within SQA.

Questioning the Question- Addressing the Answerability of Questions in Community Question - Answering Most research on community question-answering (cQA) services focuses on answer ranking, retrieval, and assessment with little attention given to question quality. A drawback to these studies is their assumption that the questions asked are of suitable quality to potentially receive a good answer in the first place. Yet evidence indicates this is not the case and that question type and content affect the number and quality of answers received. In this research project, Dr. Chirag Shah, Erik, and Vanessa focus on questions posted on Yahoo! Answers to investigate what factors contribute to the goodness of a question and determine if we can identify bad questions in order to allow askers to revise them before posting.

Investigating Failed Questions in Social Q&A - This research focuses on identifying the statistical characteristics of both the content of the failed question and information of asker who raises this question to see if this set of characteristics makes failed question statistical significantly different from well-answered question. Additionally, we worked on comparing the prediction model built upon the statistical characteristics to a recent qualitative analysis which has been focusing on constructing human's criteria for evaluating question's likelihood for failure in order to study how different is a machine prediction model from human judgment.

Measuring Content Quality in Online Q&A - The primary objective of this research is to propose and evaluate a model of content quality in online Q&A environments, which include expert-based Q&A services such as IPL and Ask-a-Librarian, as well as social/community Q&A services such as Yahoo! Answers and WikiAnswers. Vanessa worked on this as a part of the OCLC/ALISE research grant award for 2011.