quarta-feira, 19 de dezembro de 2012
Conference Report: Workshop on Internet and Network Economics (WINE) 2012
Google regularly participates in the WINE conference: Workshop on Internet & Network Economics. WINE’12 just happened last week in Liverpool, UK, where there is a strong economics and computation group. WINE provides a forum for researchers across various disciplines to examine interesting algorithmic and economic problems of mutual interest that have emerged from the Internet over the past decade. For Google, the exchange of ideas at this selective workshop has resulted in innovation and improvements in algorithms and economic auctions, such as our display ad allocation.
Googlers co-authored three papers this year; here’s a synopsis of each, as well as some highlights from invited talks at the conference:
Budget Optimization for Online Campaigns with Positive Carryover Effects
This paper first argues that ad impressions may have some long-term impact on user behaviour, and refers to an older WWW ’10 paper. Based on this motivation, the paper presents a scalable budget optimization algorithm for online advertising campaigns in the presence of Markov user behavior. In such settings, showing an ad to a user may change their actions in the future through a Markov model, and the probability of conversion for the ad does not only depend on the last ad shown, but also on earlier user activities. The main purpose of the paper is to give a simpler algorithm to solve a constrained Markov Decision Process, and confirms this easier solution via simulations on some advertising data sets. The paper was written when Nikolay Archak, a PhD student at NYU business school, was an intern with the New York market algorithms research team.
On Fixed-Price Marketing for Goods with Positive Network Externalities
This paper presents an approximation algorithm for marketing “networked goods” and services that exhibit positive network externalities - for example, is the buyer's value for the goods or service influenced positively by other buyers owning the goods or using the service? Such positive network externalities arise in many products like operating systems or smartphone services. While most of previous research is concerned with influence maximization, this paper attempts to identify a revenue maximizing marketing strategy for such networked goods, as follows: The seller selects a set (S) of buyers and gives them the goods for free, then sets a fixed per-unit price (p), at which other consumers can buy the item. The strategy is consistent with practice and is easy to implement. The authors use ideas from non-negative submodular maximization to find the optimal revenue maximizing fixed-price marketing strategy.
The AND-OR game: Equilibrium Characterization
Yishay Mansour, former Visiting Faculty in Google New York, presented the results; he first argued that the existence and uniqueness of market equilibria is only known for markets with divisible goods and concave or convex utilities. Then he described a simple market AND-OR game for divisible goods. To my surprise, he showed a class of mixed strategies are basically the unique set of randomized equilibria for this market (up to minor changes in the outcome). At the end, Yishay challenged the audience to give such characterization for more general markets with indivisible goods.
Kamal Jain of Ebay Research gave an interesting talk about mechanism design problems, inspired by application in companies like Ebay and Google. In one part, Kamal proposed "coopetitive ad auctions" for settings in which the auctioneer runs an auction among buyers who may cooperate with some advertisers, and at the same time compete with others for sealing advertising slots. He gave context around "product ads"; for example, a retailer like Best Buy may cooperate with a manufacturer like HP to put out a product ad for an HP computer sold at Best Buy. Kamal argued that if the cooperation is not an explicit part of the auction, an advertiser may implicitly end up competing with itself, thus decreasing the social welfare. By making the cooperation an explicit part of the auction, he was able to design a mechanism with better social welfare and revenue properties, compared to both first-price and second-price auctions. Kamal also discussed optimal mechanisms for intermediaries, and “surplus auctions” to avoid cyclic bidding behavior resulted from running naive variants of first-price auctions in repeated settings.
David Parkes of Harvard University discussed techniques to combine mechanism design with machine learning or heuristic search algorithms. At one point David discussed how to implement a branch-and-bound search algorithm in a way that results in a "monotone" allocation rule, so that if we implement a VCG-type allocation and pricing rule based on this allocation algorithm, the resulting mechanism becomes truthful. David also presented ways to compute a set of prices for any allocation, respecting incentive compatibility constraints as much as possible. Both of these topics appeared in ACM EC 2012 papers that he had co-authored.
At the business meeting, there was a proposal to change the title of the conference from “workshop” to “conference” or “symposium” to reflect its fully peer-reviewed and archival nature, keeping the same acronym of WINE. (Changing the title to “Symposium on the Web, Internet, and Network Economics” was rejected: SWINE!) WINE 2013 will be held at Harvard University in Boston, MA, and we look forward to reconnecting with fellow researchers in the field and continuing to nurture new developments and research topics.
terça-feira, 18 de dezembro de 2012
Using online courses in Spain to teach entrepreneurship
Posted by Francisco Ruiz Anton, Policy Manager, Google Spain
Cross-posted with the Policy by the Numbers Blog
At the end of the third quarter in 2012, roughly 25% of adults in Spain were out of work. More than half of adults under 24 years old are unemployed. Recent graduates and young adults preparing to enter the workforce face the toughest job market in decades.
The Internet presents an opportunity for growth and economic development. According to recent research, more than 100,000 jobs in Spain originate from the Internet and it directly contributes to the GDP with 26.7 billion euros (2.5%). That impact that could triple by 2015 under the right conditions.
One of those conditions is making high-quality education accessible, echoed by a recent OECD report on the youth labor market in Spain. This is no easy task. University degrees are in high demand, straining the reach of our existing institutions.
The web has become a way for learners to develop new skills when traditional institutions aren’t an option. Recent courses on platforms like Udacity, Coursera and edX have seen hundreds of thousands of students enroll and participate in courses taught by prestigious professors and lecturers.
Google is partnering with numerous organizations and universities in Spain to organize UniMOOC, an online course intended to educate citizens in Spain and the rest of the Spanish-speaking world about entrepreneurship. It was built with Course Builder, Google’s new open source toolkit for constructing online courses.
To date nearly 10,000 students have registered for the course, over two-thirds of them from Spain and one-third from 93 countries. It recently won an award for the “Most innovative project” in 2012 from the newspaper El Mundo.
Spain’s situation is not entirely unique in Europe. Policymakers across the continent are asking themselves how best to create economic opportunity for their citizens, and how to ensure that their best and brightest students are on a path toward financial success. Our hope is that the people taking this course will be more empowered with the right skills and tools to start their own businesses that can create jobs. They will push not only Spain, but Europe and the rest of the world towards economic recovery and growth.
The course is still running, and you’re able to join today.
Cross-posted with the Policy by the Numbers Blog
At the end of the third quarter in 2012, roughly 25% of adults in Spain were out of work. More than half of adults under 24 years old are unemployed. Recent graduates and young adults preparing to enter the workforce face the toughest job market in decades.
The Internet presents an opportunity for growth and economic development. According to recent research, more than 100,000 jobs in Spain originate from the Internet and it directly contributes to the GDP with 26.7 billion euros (2.5%). That impact that could triple by 2015 under the right conditions.
One of those conditions is making high-quality education accessible, echoed by a recent OECD report on the youth labor market in Spain. This is no easy task. University degrees are in high demand, straining the reach of our existing institutions.
The web has become a way for learners to develop new skills when traditional institutions aren’t an option. Recent courses on platforms like Udacity, Coursera and edX have seen hundreds of thousands of students enroll and participate in courses taught by prestigious professors and lecturers.
Google is partnering with numerous organizations and universities in Spain to organize UniMOOC, an online course intended to educate citizens in Spain and the rest of the Spanish-speaking world about entrepreneurship. It was built with Course Builder, Google’s new open source toolkit for constructing online courses.
To date nearly 10,000 students have registered for the course, over two-thirds of them from Spain and one-third from 93 countries. It recently won an award for the “Most innovative project” in 2012 from the newspaper El Mundo.
Spain’s situation is not entirely unique in Europe. Policymakers across the continent are asking themselves how best to create economic opportunity for their citizens, and how to ensure that their best and brightest students are on a path toward financial success. Our hope is that the people taking this course will be more empowered with the right skills and tools to start their own businesses that can create jobs. They will push not only Spain, but Europe and the rest of the world towards economic recovery and growth.
The course is still running, and you’re able to join today.
segunda-feira, 17 de dezembro de 2012
Millions of Core-Hours Awarded to Science
Posted by Andrea Held, Program Manager, University Relations
In 2011 Google University Relations launched a new academic research awards program, Google Exacycle for Visiting Faculty, offering up to one billion core-hours to qualifying proposals. We were looking for projects that would consume 100M+ core-hours each and be of critical benefit to society. Not surprisingly, there was no shortage of applications.
Since then, the following seven scientists have been working on-site at Google offices in Mountain View and Seattle. They are here to run large computing experiments on Google’s infrastructure to change the future. Their projects include exploring antibiotic drug resistance, protein folding and structural modelling, drug discovery, and last but not least, the dynamic universe.
Today, we would like to introduce the Exacycle award recipients and their work. Please stay tuned for updates next year.
Simulating a Dynamic Universe with the Large Synoptic Sky Survey
Jeff Gardner, University of Washington, Seattle, WA
Collaborators: Andrew Connolly, University of Washington, Seattle, WA, and John Peterson, Purdue University, West Lafayette, IN
Research subject: The Large Synoptic Survey Telescope (LSST) is one of the most ambitious astrophysical research programs ever undertaken. Starting in 2019, the LSST’s 3.2 Gigapixel camera will repeatedly survey the southern sky, generating tens of petabytes of data every year. The images and catalogs from the LSST have the potential to transform both our understanding of the universe and the way that we engage in science in general.
Exacycle impact: In order to design the telescope to yield the best possible science, the LSST collaboration has undertaken a formidable computational campaign to simulate the telescope itself. This will optimize how the LSST surveys the sky and provide realistic datasets for the development of analysis pipelines that can operate on hundreds of petabytes. Using Exacycle, we are reducing the time required to simulate one night of LSST observing, roughly 5 million images, from 3 months down to a few days. This rapid turnaround will enable the LSST engineering teams to test new designs and new algorithms with unprecedented precision, which will ultimately lead to bigger and better science from the LSST.
Designing and Defeating Antibiotic Drug Resistance
Peter Kasson, Assistant Professor, Departments of Molecular Physiology and Biological Physics and of Biomedical Engineering, University of Virginia
Research subject: Antibiotics have made most bacterial infections routinely treatable. As antibiotic use has become common, bacterial resistance to these drugs has also increased. Recently, some bacteria have arisen that are resistant to almost all antibiotics. We are studying the basis for this resistance, in particular the enzyme that acts to break down many antibiotics. Identifying the critical changes required for pan-resistance will aid surveillance and prevention; it will also help elucidate targets for the development of new therapeutic agents.
Exacycle impact: Exacycle allows us to simulate the structure and dynamics of several thousand enzyme variants in great detail. The structural differences between enzymes from resistant and non-resistant bacteria are subtle, so we have developed methods to compare structural "fingerprints" of the enzymes and identify distinguishing characteristics. The complexity of this calculation and large number of potential bacterial sequences mean that this is a computationally intensive task; the massive computing power offered by Exacycle in combination with some novel sampling strategies make this calculation tractable.
Sampling the conformational space of G protein-coupled receptors
Kai Kohlhoff, Research Scientist at Google
Collaborators: Research labs of Vijay Pande and Russ Altman at Stanford University
Research subject: G protein-coupled receptors (GPCRs) are proteins that act as signal transducers in the cell membrane and influence the response of a cell to a variety of external stimuli. GPCRs play a role in many human diseases, such as asthma and hypertension, and are well established as a primary drug target.
Exacycle impact: Exacycle let us perform many tens of thousands of molecular simulations of membrane-bound GPCRs in parallel using the Gromacs software. With MapReduce, Dremel, and other technologies, we analyzed the 100s of Terabytes of generated data and built Markov State Models. The information contained in these models can help scientists design drugs that have higher potency and specificity than those presently available.
Results: Our models let us explore kinetically meaningful receptor states and transition rates, which improved our understanding of the structural changes that take place during activation of a signaling receptor. In addition, we used Exacycle to study the affinity of drug molecules when binding to different receptor states.
Modeling transport through the nuclear pore complex
Daniel Russel, post doc in structural biology, University of California, San Francisco
Research subject: Our goal is to develop a predictive model of transport through the nuclear pore complex (NPC). Developing the model requires understanding how the behavior of the NPC varies as we change the parameters governing the components of the system. Such a model will allow us to understand how transportins, the unstructured domains and the rest of the cellular milieu, interact to determine efficiency and specificity of macromolecular transport into and out of the nucleus.
Exacycle impact: Since data describing the microscopic behavior of most parts of the nuclear transport process is incomplete and contradictory, we have to explore a larger parameter space than would be feasible with traditional computational resources.
Status: We are currently modeling various experimental measurements of aspects of the nuclear transport process. These experiments range from simple ones containing only a few components of the transport process to measurements on the whole nuclear pore with transportins and cellular milieu.
Large scale screening for new drug leads that modulate the activity of disease-relevant proteins
James Swetnam, Scientific Software Engineer, drugable.org, NYU School of Medicine
Collaborators: Tim Cardozo, MD, PhD - NYU School of Medicine.
Research subject: We are using a high throughput, CPU-bound procedure known as virtual ligand screening to ‘dock’, or produce rough estimates of binding energy, for a large sample of bioactive chemical space to the entirety of known protein structures. Our goal is the first computational picture of how bioactive chemistry with therapeutic potential can affect human and pathogen biology.
Exacycle Impact: Typically, using our academic lab’s resources, we could screen a few tens of thousands of compounds against a single protein to try to find modulators of its function. To date, Exacycle has enabled us to screen 545,130 compounds against 8,535 protein structures that are involved in important and underserved diseases as cancer, diabetes, malaria, and HIV to look for new leads towards future drugs.
Status: We are currently expanding our screens to an additional 206,190 models from
ModBase. We aim to have a public dataset for the research community in the first half of 2013.
Protein Structure Prediction and Design
Michael Tyka, Research Fellow, University of Washington, Seattle, WA
Research subject: The precise relationship between the primary sequence and the three dimensional structure of proteins is one of the unsolved grand challenges of computational biochemistry. The Baker Lab has made significant progress in recent years by developing more powerful protein prediction and design algorithms using the Rosetta Protein Modelling suite.
Exacycle impact: Limitations in the accuracy of the physical model and lack of sufficient computational power have prevented solutions to broader classes of medically relevant problems. Exacycle allows us to improve model quality by conducting large parameter optimization sweeps with a very large dataset of experimental protein structural data. The improved energy functions will benefit the entire theoretical protein research community.
We are also using Exacycle to conduct simultaneous docking and one-sided protein design to develop novel protein binders for a number of medically relevant targets. For the first time, we are able to aggressively redesign backbone conformations at the binding site. This allows for a much greater flexibility in possible binding shapes but also hugely increases the space of possibilities that have to be sampled. Very promising designs have already been found using this method.
In 2011 Google University Relations launched a new academic research awards program, Google Exacycle for Visiting Faculty, offering up to one billion core-hours to qualifying proposals. We were looking for projects that would consume 100M+ core-hours each and be of critical benefit to society. Not surprisingly, there was no shortage of applications.
Since then, the following seven scientists have been working on-site at Google offices in Mountain View and Seattle. They are here to run large computing experiments on Google’s infrastructure to change the future. Their projects include exploring antibiotic drug resistance, protein folding and structural modelling, drug discovery, and last but not least, the dynamic universe.
Today, we would like to introduce the Exacycle award recipients and their work. Please stay tuned for updates next year.
Simulating a Dynamic Universe with the Large Synoptic Sky Survey
Jeff Gardner, University of Washington, Seattle, WA
Collaborators: Andrew Connolly, University of Washington, Seattle, WA, and John Peterson, Purdue University, West Lafayette, IN
Research subject: The Large Synoptic Survey Telescope (LSST) is one of the most ambitious astrophysical research programs ever undertaken. Starting in 2019, the LSST’s 3.2 Gigapixel camera will repeatedly survey the southern sky, generating tens of petabytes of data every year. The images and catalogs from the LSST have the potential to transform both our understanding of the universe and the way that we engage in science in general.
Exacycle impact: In order to design the telescope to yield the best possible science, the LSST collaboration has undertaken a formidable computational campaign to simulate the telescope itself. This will optimize how the LSST surveys the sky and provide realistic datasets for the development of analysis pipelines that can operate on hundreds of petabytes. Using Exacycle, we are reducing the time required to simulate one night of LSST observing, roughly 5 million images, from 3 months down to a few days. This rapid turnaround will enable the LSST engineering teams to test new designs and new algorithms with unprecedented precision, which will ultimately lead to bigger and better science from the LSST.
Designing and Defeating Antibiotic Drug Resistance
Peter Kasson, Assistant Professor, Departments of Molecular Physiology and Biological Physics and of Biomedical Engineering, University of Virginia
Research subject: Antibiotics have made most bacterial infections routinely treatable. As antibiotic use has become common, bacterial resistance to these drugs has also increased. Recently, some bacteria have arisen that are resistant to almost all antibiotics. We are studying the basis for this resistance, in particular the enzyme that acts to break down many antibiotics. Identifying the critical changes required for pan-resistance will aid surveillance and prevention; it will also help elucidate targets for the development of new therapeutic agents.
Exacycle impact: Exacycle allows us to simulate the structure and dynamics of several thousand enzyme variants in great detail. The structural differences between enzymes from resistant and non-resistant bacteria are subtle, so we have developed methods to compare structural "fingerprints" of the enzymes and identify distinguishing characteristics. The complexity of this calculation and large number of potential bacterial sequences mean that this is a computationally intensive task; the massive computing power offered by Exacycle in combination with some novel sampling strategies make this calculation tractable.
Sampling the conformational space of G protein-coupled receptors
Kai Kohlhoff, Research Scientist at Google
Collaborators: Research labs of Vijay Pande and Russ Altman at Stanford University
Research subject: G protein-coupled receptors (GPCRs) are proteins that act as signal transducers in the cell membrane and influence the response of a cell to a variety of external stimuli. GPCRs play a role in many human diseases, such as asthma and hypertension, and are well established as a primary drug target.
Exacycle impact: Exacycle let us perform many tens of thousands of molecular simulations of membrane-bound GPCRs in parallel using the Gromacs software. With MapReduce, Dremel, and other technologies, we analyzed the 100s of Terabytes of generated data and built Markov State Models. The information contained in these models can help scientists design drugs that have higher potency and specificity than those presently available.
Results: Our models let us explore kinetically meaningful receptor states and transition rates, which improved our understanding of the structural changes that take place during activation of a signaling receptor. In addition, we used Exacycle to study the affinity of drug molecules when binding to different receptor states.
Modeling transport through the nuclear pore complex
Daniel Russel, post doc in structural biology, University of California, San Francisco
Research subject: Our goal is to develop a predictive model of transport through the nuclear pore complex (NPC). Developing the model requires understanding how the behavior of the NPC varies as we change the parameters governing the components of the system. Such a model will allow us to understand how transportins, the unstructured domains and the rest of the cellular milieu, interact to determine efficiency and specificity of macromolecular transport into and out of the nucleus.
Exacycle impact: Since data describing the microscopic behavior of most parts of the nuclear transport process is incomplete and contradictory, we have to explore a larger parameter space than would be feasible with traditional computational resources.
Status: We are currently modeling various experimental measurements of aspects of the nuclear transport process. These experiments range from simple ones containing only a few components of the transport process to measurements on the whole nuclear pore with transportins and cellular milieu.
Large scale screening for new drug leads that modulate the activity of disease-relevant proteins
James Swetnam, Scientific Software Engineer, drugable.org, NYU School of Medicine
Collaborators: Tim Cardozo, MD, PhD - NYU School of Medicine.
Research subject: We are using a high throughput, CPU-bound procedure known as virtual ligand screening to ‘dock’, or produce rough estimates of binding energy, for a large sample of bioactive chemical space to the entirety of known protein structures. Our goal is the first computational picture of how bioactive chemistry with therapeutic potential can affect human and pathogen biology.
Exacycle Impact: Typically, using our academic lab’s resources, we could screen a few tens of thousands of compounds against a single protein to try to find modulators of its function. To date, Exacycle has enabled us to screen 545,130 compounds against 8,535 protein structures that are involved in important and underserved diseases as cancer, diabetes, malaria, and HIV to look for new leads towards future drugs.
Status: We are currently expanding our screens to an additional 206,190 models from
ModBase. We aim to have a public dataset for the research community in the first half of 2013.
Protein Structure Prediction and Design
Michael Tyka, Research Fellow, University of Washington, Seattle, WA
Research subject: The precise relationship between the primary sequence and the three dimensional structure of proteins is one of the unsolved grand challenges of computational biochemistry. The Baker Lab has made significant progress in recent years by developing more powerful protein prediction and design algorithms using the Rosetta Protein Modelling suite.
Exacycle impact: Limitations in the accuracy of the physical model and lack of sufficient computational power have prevented solutions to broader classes of medically relevant problems. Exacycle allows us to improve model quality by conducting large parameter optimization sweeps with a very large dataset of experimental protein structural data. The improved energy functions will benefit the entire theoretical protein research community.
We are also using Exacycle to conduct simultaneous docking and one-sided protein design to develop novel protein binders for a number of medically relevant targets. For the first time, we are able to aggressively redesign backbone conformations at the binding site. This allows for a much greater flexibility in possible binding shapes but also hugely increases the space of possibilities that have to be sampled. Very promising designs have already been found using this method.
quinta-feira, 13 de dezembro de 2012
Continuing the quest for future computer scientists with CS4HS
Erin Mindell, Program Manager, Google Education
Computer Science for High School (CS4HS) began five years ago with a simple question: How can we help create a much needed influx of CS majors into universities and the workforce? We took our questions to three of our university partners--University of Washington, Carnegie Mellon, and UCLA--and together we came up with CS4HS. The model was based on a “train the trainer” technique. By focusing our efforts on teachers and bringing them the skills they need to implement CS into their classrooms, we would be able to reach even more students. With grants from Google, our partner universities created curriculum and put together hands-on, community-based workshops for their local area teachers.
Since the initial experiment, CS4HS has exploded into a worldwide program, reaching more than 4,000 teachers and 200,000 students either directly or indirectly in more than 34 countries. These hands-on, in-person workshops are a hallmark of our program, and we will continue to fund these projects going forward. (For information on how to apply, please see our website.) The success of this popular program speaks for itself, as we receive more quality proposals each year. But success comes at a price, and we have found that the current format of the workshops is not infinitely scalable.
This is where Research at Google comes in. This year, we are experimenting with a new model for CS4HS workshops. By harnessing the success of online courses such as Power Searching with Google, and utilizing open-source platforms like the one found in Course Builder, we are hoping to put the “M” in “MOOC” and reach a broader audience of educators, eager to learn how to teach CS in their classrooms.
For this pilot, we are looking to sponsor two online workshops, one that is geared toward CS teachers, and one that is geared toward CS for non-CS teachers to go live in 2013. This is a way for a university (or several colleges working together) to create one incredible workshop that has the potential to reach thousands of enthusiastic teachers. Just as with our in-person workshops, applications will be open to college, university, and technical schools of higher learning only, as we depend on their curriculum expertise to put together the most engaging programs. For this pilot, we will be looking for MOOC proposals in the US and Canada only.
We are really excited about the possibilities of this new format, and we are looking for quality applications to fund. While applications don’t have to run on our Course Builder platform, we may be able to offer some additional support to funded projects that do. If you are interested in joining our experiment or just learning more, you can find information on how to apply on our CS4HS website (or click here).
Applications are open until February 16, 2013; we can’t wait to see what you come up with. If you have questions, please email us at cs4hs@google.com.
Computer Science for High School (CS4HS) began five years ago with a simple question: How can we help create a much needed influx of CS majors into universities and the workforce? We took our questions to three of our university partners--University of Washington, Carnegie Mellon, and UCLA--and together we came up with CS4HS. The model was based on a “train the trainer” technique. By focusing our efforts on teachers and bringing them the skills they need to implement CS into their classrooms, we would be able to reach even more students. With grants from Google, our partner universities created curriculum and put together hands-on, community-based workshops for their local area teachers.
Since the initial experiment, CS4HS has exploded into a worldwide program, reaching more than 4,000 teachers and 200,000 students either directly or indirectly in more than 34 countries. These hands-on, in-person workshops are a hallmark of our program, and we will continue to fund these projects going forward. (For information on how to apply, please see our website.) The success of this popular program speaks for itself, as we receive more quality proposals each year. But success comes at a price, and we have found that the current format of the workshops is not infinitely scalable.
This is where Research at Google comes in. This year, we are experimenting with a new model for CS4HS workshops. By harnessing the success of online courses such as Power Searching with Google, and utilizing open-source platforms like the one found in Course Builder, we are hoping to put the “M” in “MOOC” and reach a broader audience of educators, eager to learn how to teach CS in their classrooms.
For this pilot, we are looking to sponsor two online workshops, one that is geared toward CS teachers, and one that is geared toward CS for non-CS teachers to go live in 2013. This is a way for a university (or several colleges working together) to create one incredible workshop that has the potential to reach thousands of enthusiastic teachers. Just as with our in-person workshops, applications will be open to college, university, and technical schools of higher learning only, as we depend on their curriculum expertise to put together the most engaging programs. For this pilot, we will be looking for MOOC proposals in the US and Canada only.
We are really excited about the possibilities of this new format, and we are looking for quality applications to fund. While applications don’t have to run on our Course Builder platform, we may be able to offer some additional support to funded projects that do. If you are interested in joining our experiment or just learning more, you can find information on how to apply on our CS4HS website (or click here).
Applications are open until February 16, 2013; we can’t wait to see what you come up with. If you have questions, please email us at cs4hs@google.com.
quinta-feira, 8 de novembro de 2012
Desenvolvimento de Sites
Procura um serviço de qualidade, para desenvolver seu site ou o site da sua empresa?
Conheça nosso portfólio e faça seu orçamento agora mesmo!
Conheça nosso portfólio e faça seu orçamento agora mesmo!
Desenvolvimento de sites de alto padrão, tudo o que você ou sua empresa precisam! Entre em contato agora mesmo e faça seu orçamento!
Exemplo de sites criados:
quarta-feira, 31 de outubro de 2012
Large Scale Language Modeling in Automatic Speech Recognition
Posted by Ciprian Chelba, Research Scientist
At Google, we’re able to use the large amounts of data made available by the Web’s fast growth. Two such data sources are the anonymized queries on google.com and the web itself. They help improve automatic speech recognition through large language models: Voice Search makes use of the former, whereas YouTube speech transcription benefits significantly from the latter.
The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words.
The computational effort pays off, as highlighted by the plot above: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models.
A more detailed summary of results on Voice Search and a few YouTube speech transcription tasks (authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar) presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.
The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words.
The computational effort pays off, as highlighted by the plot above: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models.
A more detailed summary of results on Voice Search and a few YouTube speech transcription tasks (authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar) presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.
domingo, 28 de outubro de 2012
Ubuntu Server distribution upgrade
Doing a distribution upgrade is easy in the graphical user interface. On a "headless" Ubuntu Server, you probably don't have a graphical user interface available, and you need to perform a few easy steps.
First you need to check 2 things:
Then the distribution upgrade can be started by running this command:
It is not "recommended" to do a distribution upgrade remotely over ssh. The "do-release-upgrade" command above checks on this, and gives a warning. However, I didn't have any problem doing the upgrade remotely over ssh.
First you need to check 2 things:
- The package "update-manager-core" needs to be installed. Just run the following command: (it will install it, if it is not already)
sudo apt-get install update-manager-core
- On Ubuntu Server, the update channel is usually "LTS" (Long Time Support) which will upgrade only to the major release of Ubuntu Server. Probably, you want to set this to "normal", to allow an upgrade to every distribution upgrade. To do this open the file "/etc/update-manager/release-upgrades", and change the line "Prompt=lts" to "Prompt=normal".
For example, with pico:sudo pico /etc/update-manager/release-upgrades
Change the Prompt setting to "normal".
Press Ctrl-X, press "Y" to confirm and press enter to confirm the filename.
Then the distribution upgrade can be started by running this command:
sudo do-release-upgrade -d
It is not "recommended" to do a distribution upgrade remotely over ssh. The "do-release-upgrade" command above checks on this, and gives a warning. However, I didn't have any problem doing the upgrade remotely over ssh.
quinta-feira, 18 de outubro de 2012
Converting a date/time to time_t using FxGqlC (or SQL)
The time_t is used in C++ to represent a date/time. It is expressed in seconds since Januari 1st, 1970.
To get the current date/time as a time_t value, you can run this query in FxGqlC (or SQL):
select datediff(second, '1970-01-01', getutcdate())
You need to use getutcdate() because time_t defines the UTC time.
Or for any arbitrary date/time (in UTC):
select datediff(second, '1970-01-01', '2012-10-18 22:33')
-- Returns 1350599580
The other way around is also easy: run this query to convert a time_t to a date/time
select dateadd(second, 1234567890, '1970-01-01')
-- Returns 13/02/2009 23:31:30
To get the current date/time as a time_t value, you can run this query in FxGqlC (or SQL):
select datediff(second, '1970-01-01', getutcdate())
You need to use getutcdate() because time_t defines the UTC time.
Or for any arbitrary date/time (in UTC):
select datediff(second, '1970-01-01', '2012-10-18 22:33')
-- Returns 1350599580
The other way around is also easy: run this query to convert a time_t to a date/time
select dateadd(second, 1234567890, '1970-01-01')
-- Returns 13/02/2009 23:31:30
Ngram Viewer 2.0
Posted by Jon Orwant, Engineering Manager
Since launching the Google Books Ngram Viewer, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing flapper, hippie, and yuppie, you can see when each word peaked:
Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our Science paper.)
At Google, we’re also trying to understand the meaning behind what people write, and to do that it helps to understand grammar. Last summer Slav Petrov of Google’s Natural Language Processing group and his intern Yuri Lin (who’s since joined Google full-time) built a system that identified parts of speech—nouns, adverbs, conjunctions and so forth—for all of the words in the millions of Ngram Viewer books. Now, for instance, you can compare the verb and noun forms of “cheer” to see how the frequencies have converged over time:
Some users requested the ability to combine Ngrams, and Googler Matthew Gray generalized that notion into what we’re calling Ngram compositions: the ability to add, subtract, multiply, and divide Ngram counts. For instance, you can see how “record player” rose at the expense of “Victrola”:
Our info page explains all the details about this curious notion of treating phrases like components of a mathematical expression. We’re guessing they’ll only be of interest to lexicographers, but then again that’s what we thought about Ngram Viewer 1.0.
Oh, and we added Italian too, supplementing our current languages: English, Chinese, Spanish, French, German, Hebrew, and Russian. Buon divertimento!
Since launching the Google Books Ngram Viewer, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing flapper, hippie, and yuppie, you can see when each word peaked:
Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our Science paper.)
At Google, we’re also trying to understand the meaning behind what people write, and to do that it helps to understand grammar. Last summer Slav Petrov of Google’s Natural Language Processing group and his intern Yuri Lin (who’s since joined Google full-time) built a system that identified parts of speech—nouns, adverbs, conjunctions and so forth—for all of the words in the millions of Ngram Viewer books. Now, for instance, you can compare the verb and noun forms of “cheer” to see how the frequencies have converged over time:
Some users requested the ability to combine Ngrams, and Googler Matthew Gray generalized that notion into what we’re calling Ngram compositions: the ability to add, subtract, multiply, and divide Ngram counts. For instance, you can see how “record player” rose at the expense of “Victrola”:
Our info page explains all the details about this curious notion of treating phrases like components of a mathematical expression. We’re guessing they’ll only be of interest to lexicographers, but then again that’s what we thought about Ngram Viewer 1.0.
Oh, and we added Italian too, supplementing our current languages: English, Chinese, Spanish, French, German, Hebrew, and Russian. Buon divertimento!
segunda-feira, 15 de outubro de 2012
LibreOffice doesn't appear in alt-tab view
I have an annoying problem with Unity on Ubuntu: it often happens that LibreOffice/OpenOffice Calc or Writer windows don't appear in the alt-tab application switching.
Googling the problem learned me that it is a known problem: https://bugs.launchpad.net/bamf/+bug/1026426
The good news is that it is solved in Ubuntu 12.10, and a patch will be available for 12.04 LTS. A workaround is to restart Unity:
Googling the problem learned me that it is a known problem: https://bugs.launchpad.net/bamf/+bug/1026426
The good news is that it is solved in Ubuntu 12.10, and a patch will be available for 12.04 LTS. A workaround is to restart Unity:
- Start a command window (Ctrl-Alt-T)
- Run:
unity --replace & disown
quinta-feira, 4 de outubro de 2012
ReFr: A New Open-Source Framework for Building Reranking Models
Posted by Dan Bikel and Keith Hall, Research Scientists at Google
We are pleased to announce the release of an open source, general-purpose framework designed for reranking problems, ReFr (Reranker Framework), now available at: http://code.google.com/p/refr/.
Many types of systems capable of processing speech and human language text produce multiple hypothesized outputs for a given input, each with a score. In the case of machine translation systems, these hypotheses correspond to possible translations from some sentence in a source language to a target language. In the case of speech recognition, the hypotheses are possible word sequences of what was said derived from the input audio. The goal of such systems is usually to produce a single output for a given input, and so they almost always just pick the highest-scoring hypothesis.
A reranker is a system that uses a trained model to rerank these scored hypotheses, possibly inducing a different ranked order. The goal is that by employing a second model after the fact, one can make use of additional information not available to the original model, and produce better overall results. This approach has been shown to be useful for a wide variety of speech and natural language processing problems, and was the subject of one of the groups at the 2011 summer workshop at Johns Hopkins’ Center for Language and Speech Processing. At that workshop, led by Professor Brian Roark of Oregon Health & Science University, we began building a general-purpose framework for training and using reranking models. The result of all this work is ReFr.
From the outset, we designed ReFr with both speed and flexibility in mind. The core implementation is entirely in C++, with a flexible architecture allowing rich experimentation with both features and learning methods. The framework also employs a powerful runtime configuration mechanism to make experimentation even easier. Finally, ReFr leverages the parallel processing power of Hadoop to train and use large-scale reranking models in a distributed computing environment.
We are pleased to announce the release of an open source, general-purpose framework designed for reranking problems, ReFr (Reranker Framework), now available at: http://code.google.com/p/refr/.
Many types of systems capable of processing speech and human language text produce multiple hypothesized outputs for a given input, each with a score. In the case of machine translation systems, these hypotheses correspond to possible translations from some sentence in a source language to a target language. In the case of speech recognition, the hypotheses are possible word sequences of what was said derived from the input audio. The goal of such systems is usually to produce a single output for a given input, and so they almost always just pick the highest-scoring hypothesis.
A reranker is a system that uses a trained model to rerank these scored hypotheses, possibly inducing a different ranked order. The goal is that by employing a second model after the fact, one can make use of additional information not available to the original model, and produce better overall results. This approach has been shown to be useful for a wide variety of speech and natural language processing problems, and was the subject of one of the groups at the 2011 summer workshop at Johns Hopkins’ Center for Language and Speech Processing. At that workshop, led by Professor Brian Roark of Oregon Health & Science University, we began building a general-purpose framework for training and using reranking models. The result of all this work is ReFr.
From the outset, we designed ReFr with both speed and flexibility in mind. The core implementation is entirely in C++, with a flexible architecture allowing rich experimentation with both features and learning methods. The framework also employs a powerful runtime configuration mechanism to make experimentation even easier. Finally, ReFr leverages the parallel processing power of Hadoop to train and use large-scale reranking models in a distributed computing environment.
terça-feira, 2 de outubro de 2012
EMEA Faculty Summit 2012
Michel Benard, University Relations Manager
Last week we held our fifth Europe, Middle East and Africa (EMEA) Faculty Summit in London, bringing together 94 of EMEA’s foremost computer science academics from 65 universities representing 25 countries, together with more than 60 Googlers.
This year’s jam-packed agenda included a welcome reception at the Science Museum (plus a tour of the special exhibition: “Codebreaker - Alan Turing’s life and legacy”), a keynote on “Research at Google” by Alfred Spector, Vice President of Research and Special Initiatives and a welcome address by Nelson Mattos, Vice President of Engineering and Products in EMEA, covering Google’s engineering activity and recent innovations in the region.
The Faculty Summit is a chance for us to meet with academics in Computer Science and other areas to discuss the latest exciting developments in research and education, and to explore ways in which we can collaborate via our our University Relations programs.
The two and a half day program consisted of tech talks, break out sessions, a panel on online education, and demos. The program covered a variety of computer science topics including Infrastructure, Cloud Computing Applications, Information Retrieval, Machine Translation, Audio/Video, Machine Learning, User Interface, e-Commerce, Digital Humanities, Social Media, and Privacy. For example, Ed H. Chi summarized how researchers use data analysis to understand the ways users share content with their audiences using the Circle feature in Google+. Jens Riegelsberger summarized how UI design and user experience research is essential to creating a seamless experience on Google Maps. John Wilkes discussed some of the research challenges - and opportunities - associated with building, managing, and using computer systems at massive scale. Breakout sessions ranged from technical follow-ups on the talk topics to discussing ways to increase the presence of women in computer science.
We also held one-on-one sessions where academics and Googlers could meet privately and discuss topics of personal interest, such as how to develop a compelling research award proposal, how to apply for a sabbatical at Google or how to gain Google support for a conference in a particular research area.
The Summit provides a great opportunity to build and strengthen research and academic collaborations. Our hope is to drive research and education forward by fostering mutually beneficial relationships with our academic colleagues and their universities.
Last week we held our fifth Europe, Middle East and Africa (EMEA) Faculty Summit in London, bringing together 94 of EMEA’s foremost computer science academics from 65 universities representing 25 countries, together with more than 60 Googlers.
This year’s jam-packed agenda included a welcome reception at the Science Museum (plus a tour of the special exhibition: “Codebreaker - Alan Turing’s life and legacy”), a keynote on “Research at Google” by Alfred Spector, Vice President of Research and Special Initiatives and a welcome address by Nelson Mattos, Vice President of Engineering and Products in EMEA, covering Google’s engineering activity and recent innovations in the region.
The Faculty Summit is a chance for us to meet with academics in Computer Science and other areas to discuss the latest exciting developments in research and education, and to explore ways in which we can collaborate via our our University Relations programs.
The two and a half day program consisted of tech talks, break out sessions, a panel on online education, and demos. The program covered a variety of computer science topics including Infrastructure, Cloud Computing Applications, Information Retrieval, Machine Translation, Audio/Video, Machine Learning, User Interface, e-Commerce, Digital Humanities, Social Media, and Privacy. For example, Ed H. Chi summarized how researchers use data analysis to understand the ways users share content with their audiences using the Circle feature in Google+. Jens Riegelsberger summarized how UI design and user experience research is essential to creating a seamless experience on Google Maps. John Wilkes discussed some of the research challenges - and opportunities - associated with building, managing, and using computer systems at massive scale. Breakout sessions ranged from technical follow-ups on the talk topics to discussing ways to increase the presence of women in computer science.
We also held one-on-one sessions where academics and Googlers could meet privately and discuss topics of personal interest, such as how to develop a compelling research award proposal, how to apply for a sabbatical at Google or how to gain Google support for a conference in a particular research area.
The Summit provides a great opportunity to build and strengthen research and academic collaborations. Our hope is to drive research and education forward by fostering mutually beneficial relationships with our academic colleagues and their universities.
terça-feira, 18 de setembro de 2012
Running Continuous Geo Experiments to Assess Ad Effectiveness
Posted by Jon Vaver, Research Scientist and Lizzy Van Alstine, Marketing Manager
Advertisers have a fundamental need to measure the effectiveness of their advertising campaigns. In a previous paper, we described the application of geo experiments to measuring the impact of advertising on consumer behavior (e.g. clicks, conversions, downloads). This method involves randomly assigning experimental units to control and test conditions and measuring the subsequent impact on consumer behavior. It is a practical way of incorporating the gold standard of randomized experiments into the analysis of marketing effectiveness. However, advertising decisions are not static, and the original method is most applicable to a one-time analysis. In a follow-up paper, we generalize the approach to accommodate periodic (ongoing) measurement of ad effectiveness.
In this expanded approach, the test and control assignments of each geographic region rotate across multiple test periods, and these rotations provide the opportunity to generate a sequence of measurements of campaign effectiveness. The data across test periods can also be pooled to create a single aggregate measurement of campaign effectiveness. These sequential and pooled measurements have smaller confidence intervals than measurements from a series of geo experiments with a single test period. Alternatively, the same confidence interval can be achieved with a reduced magnitude or duration of ad spend change, thereby lowering the cost of measurement. The net result is a better method for periodic and isolated measurement of ad effectiveness.
Advertisers have a fundamental need to measure the effectiveness of their advertising campaigns. In a previous paper, we described the application of geo experiments to measuring the impact of advertising on consumer behavior (e.g. clicks, conversions, downloads). This method involves randomly assigning experimental units to control and test conditions and measuring the subsequent impact on consumer behavior. It is a practical way of incorporating the gold standard of randomized experiments into the analysis of marketing effectiveness. However, advertising decisions are not static, and the original method is most applicable to a one-time analysis. In a follow-up paper, we generalize the approach to accommodate periodic (ongoing) measurement of ad effectiveness.
In this expanded approach, the test and control assignments of each geographic region rotate across multiple test periods, and these rotations provide the opportunity to generate a sequence of measurements of campaign effectiveness. The data across test periods can also be pooled to create a single aggregate measurement of campaign effectiveness. These sequential and pooled measurements have smaller confidence intervals than measurements from a series of geo experiments with a single test period. Alternatively, the same confidence interval can be achieved with a reduced magnitude or duration of ad spend change, thereby lowering the cost of measurement. The net result is a better method for periodic and isolated measurement of ad effectiveness.
terça-feira, 11 de setembro de 2012
Power Searching with Google is back
Posted by Dan Russell, Uber Tech Lead, Search Quality & User Happiness
If you missed Power Searching with Google a few months ago or were unable to complete the course the first time around, now’s your chance to sign up again for our free online course that aims to empower our users with the tools and knowledge to find what they’re looking for more quickly and easily.
The community-based course features six 50-minute classes along with interactive activities and the opportunity to hear from search experts and Googlers about how search works. Beginning September 24, you can take the classes over a two-week period, share what you learn with other students in a community forum, and complete the course assessments to earn a certificate of completion.
During the course’s first run in July, people told us how they not only liked learning about new features and more efficient ways to use Google, but they also enjoyed sharing tips and learning from one another through the forums and Hangouts. Ninety-six percent of people who completed the course also said they liked the format and would be interested in taking similar courses, so we plan to offer a suite of upcoming courses in the coming months, including Advanced Power Searching.
Stay tuned for further announcements on those upcoming courses, and don’t forget to register now for Power Searching with Google. You’ll learn about things like how to search by color, image, and time and how to solve harder trivia questions like our A Google a Day questions. We’ll see you when we start up in two weeks!
If you missed Power Searching with Google a few months ago or were unable to complete the course the first time around, now’s your chance to sign up again for our free online course that aims to empower our users with the tools and knowledge to find what they’re looking for more quickly and easily.
The community-based course features six 50-minute classes along with interactive activities and the opportunity to hear from search experts and Googlers about how search works. Beginning September 24, you can take the classes over a two-week period, share what you learn with other students in a community forum, and complete the course assessments to earn a certificate of completion.
During the course’s first run in July, people told us how they not only liked learning about new features and more efficient ways to use Google, but they also enjoyed sharing tips and learning from one another through the forums and Hangouts. Ninety-six percent of people who completed the course also said they liked the format and would be interested in taking similar courses, so we plan to offer a suite of upcoming courses in the coming months, including Advanced Power Searching.
Stay tuned for further announcements on those upcoming courses, and don’t forget to register now for Power Searching with Google. You’ll learn about things like how to search by color, image, and time and how to solve harder trivia questions like our A Google a Day questions. We’ll see you when we start up in two weeks!
Helping the World to Teach
Posted by Peter Norvig, Director of Research
In July, Research at Google ran a large open online course, Power Searching with Google, taught by search expert, Dan Russell. The course was successful, with 155,000 registered students. Through this experiment, we learned that Google technologies can help bring education to a global audience. So we packaged up the technology we used to build Power Searching and are providing it as an open source project called Course Builder. We want to make this technology available so that others can experiment with online learning.
The Course Builder open source project is an experimental early step for us in the world of online education. It is a snapshot of an approach we found useful and an indication of our future direction. We hope to continue development along these lines, but we wanted to make this limited code base available now, to see what early adopters will do with it, and to explore the future of learning technology. We will be hosting a community building event in the upcoming months to help more people get started using this software. edX shares in the open source vision for online learning platforms, and Google and the edX team are in discussions about open standards and technology sharing for course platforms.
We are excited that Stanford University, Indiana University, UC San Diego, Saylor.org, LearningByGivingFoundation.org, Swiss Federal Institute of Technology in Lausanne (EPFL), and a group of universities in Spain led by Universia, CRUE, and Banco Santander-Universidades are considering how this experimental technology might work for some of their online courses. Sebastian Thrun at Udacity welcomes this new option for instructors who would like to create an online class, while Daphne Koller at Coursera notes that the educational landscape is changing and it is exciting to see new avenues for teaching and learning emerge. We believe Google’s preliminary efforts here may be useful to those looking to scale online education through the cloud.
Along with releasing the experimental open source code, we’ve provided documentation and forums for anyone to learn how to develop and deploy an online course like Power Searching. In addition, over the next two weeks we will provide educators the opportunity to connect with the Google team working on the code via Google Hangouts. For access to the code, documentation, user forum, and information about the Hangouts, visit the Course Builder Open Source Project Page. To see what is possible with the Course Builder technology register for Google’s next version of Power Searching. We invite you to explore this brave new world of online learning with us.
In July, Research at Google ran a large open online course, Power Searching with Google, taught by search expert, Dan Russell. The course was successful, with 155,000 registered students. Through this experiment, we learned that Google technologies can help bring education to a global audience. So we packaged up the technology we used to build Power Searching and are providing it as an open source project called Course Builder. We want to make this technology available so that others can experiment with online learning.
The Course Builder open source project is an experimental early step for us in the world of online education. It is a snapshot of an approach we found useful and an indication of our future direction. We hope to continue development along these lines, but we wanted to make this limited code base available now, to see what early adopters will do with it, and to explore the future of learning technology. We will be hosting a community building event in the upcoming months to help more people get started using this software. edX shares in the open source vision for online learning platforms, and Google and the edX team are in discussions about open standards and technology sharing for course platforms.
We are excited that Stanford University, Indiana University, UC San Diego, Saylor.org, LearningByGivingFoundation.org, Swiss Federal Institute of Technology in Lausanne (EPFL), and a group of universities in Spain led by Universia, CRUE, and Banco Santander-Universidades are considering how this experimental technology might work for some of their online courses. Sebastian Thrun at Udacity welcomes this new option for instructors who would like to create an online class, while Daphne Koller at Coursera notes that the educational landscape is changing and it is exciting to see new avenues for teaching and learning emerge. We believe Google’s preliminary efforts here may be useful to those looking to scale online education through the cloud.
Along with releasing the experimental open source code, we’ve provided documentation and forums for anyone to learn how to develop and deploy an online course like Power Searching. In addition, over the next two weeks we will provide educators the opportunity to connect with the Google team working on the code via Google Hangouts. For access to the code, documentation, user forum, and information about the Hangouts, visit the Course Builder Open Source Project Page. To see what is possible with the Course Builder technology register for Google’s next version of Power Searching. We invite you to explore this brave new world of online learning with us.
quarta-feira, 5 de setembro de 2012
Python scripts inside PowerShell window
After installing Python on Windows, you can start a Python script by executing the script file-name.
E.g.:
./MyScript.py
However, the command is not executed in the same PowerShell window. It opens a new command window to start "python.exe", which executes the command (and immediately closes the window).
This behavior can be changed, and the script can be executed within the current PowerShell window.
Just add the ".py" file extension to the PATHEXT environment variable. This can be done by executing this command:
$env:PATHEXT += ";.py"
You can add this command to your "$profile" file, so you don't need to execute this command in every newly started PowerShell window. Run the next command to add this line to your $profile automatically:
"`n" + '$env:PATHEXT += ";.py" # Transparent execution of Python scripts' |Out-File $profile -Append -Encoding Default
You can test it by creating and running a small script:
'print("Hello, World!")' | Out-File MyScript.py -Encoding Default
.\MyScript.py
quinta-feira, 30 de agosto de 2012
Uploading a file to Amazon Glacier using PowerShell
For my own convenience, I created a PowerShell module to upload backup files to Amazon's new backup service "Glacier".
The module is contained in this file:
http://dl.dropbox.com/u/2350654/blog/FxAWS.zip
(including the Amazon AWS SDK library)
Extract it into your Powershell module directory, e.g.
C:\Users\<your name>\Documents\WindowsPowerShell\Modules.
(this will create a directory FxAWS under the directory Modules)
Start powershell and run:
Import-Module FxAWS
C:\Users\wim devos.GENOFFICE> Write-AWSGlacier -AWSAccessKey '<access key>' -AWSSecretKey '<secret key>' -AWSRegion <some region> -
GlacierVault <vault name> -Filename <filename> -Description <description>
e.g.
Write-AWSGlacier -AWSAccessKey '[your access key]' -AWSSecretKey ' [your secret key] ' -AWSRegion us-east-1 -GlacierVault "backup" -Filename "backup-20120830.7z"
The parameters are:
- AWSAccessKey and AWSSecretKey.
These are NOT your login and password to log on to the Amazon.com web site.
You can find the access key and secret key in your Amazon.com account:
https://portal.aws.amazon.com/gp/aws/securityCredentials
Extra credentials can be created and they can be removed individually. - AWSRegion.
The region where your Glacier Vault has been created. - GlacierVault
The name of the Vault that you created in the Glacier administration website. - FileName
The file that you want to upload. - Description (optional)
A description used in the Glacier administration website. When none is specified, the last part of the filename path is used.
You can create 3 (global) PowerShell variabled $AWSAccessKey_Default, $AWSSecretKey_Default and $AWSRegion_Default, in which case you can omit the -AWS* parameters.
GIT External Diff / Merge tools
Occording to the GIT book, GIT supports these external Diff / Merge tools:
This can be resolved by adding your tool to the environment PATH variable, or by telling GIT where to find your tool(s):
- [araxis] Araxis Merge - Commercial
- [bc3] Beyond Compare 3 - Commercial
- [diffuse] Diffuse - Open source
- [ecmerge] ECMerge - Commercial
- [emerge] Emerge (Emacs) - Open source
- [gvimdiff] gvimdiff - Open source
- [kdiff3] KDiff3 - Open source
- [meld] Meld merge - Open source
- [opendiff] opendiff - OS X Developer Tools
- [p4merge] P4Merge - Commercial
- [tkdiff] TkDiff - Open source
- [tortoisemerge] TortoiseMerge - Open source - [ Merge tool only ]
- [vimdiff] vimdiff - Open source
- [xxdiff] xxdiff - Open source
- [kompare] Kompare - Open source - [ Diff tool only ]
You can choose your prefered diff / merge tool by executing this command (Windows or Linux), in this case for my favorite kdiff3.
git config --global diff.tool kdiff3
git config --global merge.tool kdiff3The external diff is started using this command:
git difftoolYou will probably get the error "The diff tool kdiff3 is not available as 'kdiff3'. external diff died, stopping at <filename>.", because GIT doesn't find your tool.
This can be resolved by adding your tool to the environment PATH variable, or by telling GIT where to find your tool(s):
git config --global difftool.kdiff3.path "C:/Program Files (x86)/KDiff3/kdiff3.exe"
git config --global mergetool.kdiff3.path "C:/Program Files (x86)/KDiff3/kdiff3.exe"Any configuration can be undone by executing:
git config --global <config item> --unsete.g.
git config --global mergetool.kdiff3.path --unset
quarta-feira, 29 de agosto de 2012
Users love simple and familiar designs – Why websites need to make a great first impression
Posted by Javier Bargas-Avila, Senior User Experience Researcher at YouTube UX Research
I’m sure you’ve experienced this at some point: You click on a link to a website, and after a quick glance you already know you’re not interested, so you click ‘back’ and head elsewhere. How did you make that snap judgment? Did you really read and process enough information to know that this website wasn’t what you were looking for? Or was it something more immediate?
We form first impressions of the people and things we encounter in our daily lives in an extraordinarily short timeframe. We know the first impression a website’s design creates is crucial in capturing users’ interest. In less than 50 milliseconds, users build an initial “gut feeling” that helps them decide whether they’ll stay or leave. This first impression depends on many factors: structure, colors, spacing, symmetry, amount of text, fonts, and more.
In our study we investigated how users' first impressions of websites are influenced by two design factors:
We presented screenshots of existing websites that varied in both of these factors -- visual complexity and prototypicality -- and asked users to rate their beauty.
The results show that both visual complexity and prototypicality play crucial roles in the process of forming an aesthetic judgment. It happens within incredibly short timeframes between 17 and 50 milliseconds. By comparison, the average blink of an eye takes 100 to 400 milliseconds.
And these two factors are interrelated: if the visual complexity of a website is high, users perceive it as less beautiful, even if the design is familiar. And if the design is unfamiliar -- i.e., the site has low prototypicality -- users judge it as uglier, even if it’s simple.
In other words, users strongly prefer website designs that look both simple (low complexity) and familiar (high prototypicality). That means if you’re designing a website, you’ll want to consider both factors. Designs that contradict what users typically expect of a website may hurt users’ first impression and damage their expectations. Recent research shows that negative product expectations lead to lower satisfaction in product interaction -- a downward spiral you’ll want to avoid. Go for simple and familiar if you want to appeal to your users’ sense of beauty.
I’m sure you’ve experienced this at some point: You click on a link to a website, and after a quick glance you already know you’re not interested, so you click ‘back’ and head elsewhere. How did you make that snap judgment? Did you really read and process enough information to know that this website wasn’t what you were looking for? Or was it something more immediate?
We form first impressions of the people and things we encounter in our daily lives in an extraordinarily short timeframe. We know the first impression a website’s design creates is crucial in capturing users’ interest. In less than 50 milliseconds, users build an initial “gut feeling” that helps them decide whether they’ll stay or leave. This first impression depends on many factors: structure, colors, spacing, symmetry, amount of text, fonts, and more.
In our study we investigated how users' first impressions of websites are influenced by two design factors:
- Visual complexity -- how complex the visual design of a website looks
- Prototypicality -- how representative a design looks for a certain category of websites
We presented screenshots of existing websites that varied in both of these factors -- visual complexity and prototypicality -- and asked users to rate their beauty.
The results show that both visual complexity and prototypicality play crucial roles in the process of forming an aesthetic judgment. It happens within incredibly short timeframes between 17 and 50 milliseconds. By comparison, the average blink of an eye takes 100 to 400 milliseconds.
And these two factors are interrelated: if the visual complexity of a website is high, users perceive it as less beautiful, even if the design is familiar. And if the design is unfamiliar -- i.e., the site has low prototypicality -- users judge it as uglier, even if it’s simple.
In other words, users strongly prefer website designs that look both simple (low complexity) and familiar (high prototypicality). That means if you’re designing a website, you’ll want to consider both factors. Designs that contradict what users typically expect of a website may hurt users’ first impression and damage their expectations. Recent research shows that negative product expectations lead to lower satisfaction in product interaction -- a downward spiral you’ll want to avoid. Go for simple and familiar if you want to appeal to your users’ sense of beauty.
terça-feira, 28 de agosto de 2012
Google at UAI 2012
Posted by Kevin Murphy, Research Scientist
The conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier venues for research related to probabilistic models and reasoning under uncertainty. This year's conference (the 28th) set several new records: the largest number of submissions (304 papers, last year 285), the largest number of participants (216, last year 191), the largest number of tutorials (4, last year 3), and the largest number of workshops (4, last year 1). We interpret this as a sign that the conference is growing, perhaps as part of the larger trend of increasing interest in machine learning and data analysis.
There were many interesting presentations. A couple of my favorites included:
A strong theme this year was causality. In fact, we had an invited talk on the topic by Judea Pearl, winner of the 2011 Turing Award, in addition to a one-day workshop. Although causality is sometimes regarded as something of an academic curiosity, its relevance to important practical problems (e.g., to medicine, advertising, social policy, etc.) is becoming more clear. There is still a large gap between theory and practice when it comes to making causal predictions, but it was pleasing to see that researchers in the UAI community are making steady progress on this problem.
There were two presentations at UAI by Googlers. The first, "Latent Structured Ranking," by Jason Weston and John Blitzer, described an extension to a ranking model called Wsabie, that was published at ICML in 2011, and is widely used within Google. The Wsabie model embeds a pair of items (say a query and a document) into a low dimensional space, and uses distance in that space as a measure of semantic similarity. The UAI paper extends this to the setting where there are multiple candidate documents in response to a given query. In such a context, we can get improved performance by leveraging similarities between documents in the set.
The second paper by Googlers, "Hokusai - Sketching Streams in Real Time," was presented by Sergiy Matusevych, Alex Smola and Amr Ahmed. (Amr recently joined Google from Yahoo, and Alex is a visiting faculty member at Google.) This paper extends the Count-Min sketch method for storing approximate counts to the streaming context. This extension allows one to compute approximate counts of events (such as the number of visitors to a particular website) aggregated over different temporal extents. The method can also be extended to store approximate n-gram statistics in a very compact way.
In addition to these presentations, Google was involved in UAI in several other ways: I held a program co-chair position on the organizing committee, several of the referees and attendees work at Google, and Google provided some sponsorship for the conference.
Overall, this was a very successful conference, in an idyllic setting (Catalina Island, an hour off the coast of Los Angeles). We believe UAI and its techniques will grow in importance as various organizations -- including Google -- start combining structured, prior knowledge with raw, noisy unstructured data.
The conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier venues for research related to probabilistic models and reasoning under uncertainty. This year's conference (the 28th) set several new records: the largest number of submissions (304 papers, last year 285), the largest number of participants (216, last year 191), the largest number of tutorials (4, last year 3), and the largest number of workshops (4, last year 1). We interpret this as a sign that the conference is growing, perhaps as part of the larger trend of increasing interest in machine learning and data analysis.
There were many interesting presentations. A couple of my favorites included:
- "Video In Sentences Out," by Andrei Barbu et al. This demonstrated an impressive system that is able to create grammatically correct sentences describing the objects and actions occurring in a variety of different videos.
- "Exploiting Compositionality to Explore a Large Space of Model Structures," by Roger Grosse et al. This paper (which won the Best Student Paper Award) proposed a way to view many different latent variable models for matrix decomposition - including PCA, ICA, NMF, Co-Clustering, etc. - as special cases of a general grammar. The paper then showed ways to automatically select the right kind of model for a dataset by performing greedy search over grammar productions, combined with Bayesian inference for model fitting.
A strong theme this year was causality. In fact, we had an invited talk on the topic by Judea Pearl, winner of the 2011 Turing Award, in addition to a one-day workshop. Although causality is sometimes regarded as something of an academic curiosity, its relevance to important practical problems (e.g., to medicine, advertising, social policy, etc.) is becoming more clear. There is still a large gap between theory and practice when it comes to making causal predictions, but it was pleasing to see that researchers in the UAI community are making steady progress on this problem.
There were two presentations at UAI by Googlers. The first, "Latent Structured Ranking," by Jason Weston and John Blitzer, described an extension to a ranking model called Wsabie, that was published at ICML in 2011, and is widely used within Google. The Wsabie model embeds a pair of items (say a query and a document) into a low dimensional space, and uses distance in that space as a measure of semantic similarity. The UAI paper extends this to the setting where there are multiple candidate documents in response to a given query. In such a context, we can get improved performance by leveraging similarities between documents in the set.
The second paper by Googlers, "Hokusai - Sketching Streams in Real Time," was presented by Sergiy Matusevych, Alex Smola and Amr Ahmed. (Amr recently joined Google from Yahoo, and Alex is a visiting faculty member at Google.) This paper extends the Count-Min sketch method for storing approximate counts to the streaming context. This extension allows one to compute approximate counts of events (such as the number of visitors to a particular website) aggregated over different temporal extents. The method can also be extended to store approximate n-gram statistics in a very compact way.
In addition to these presentations, Google was involved in UAI in several other ways: I held a program co-chair position on the organizing committee, several of the referees and attendees work at Google, and Google provided some sponsorship for the conference.
Overall, this was a very successful conference, in an idyllic setting (Catalina Island, an hour off the coast of Los Angeles). We believe UAI and its techniques will grow in importance as various organizations -- including Google -- start combining structured, prior knowledge with raw, noisy unstructured data.
segunda-feira, 27 de agosto de 2012
Solving ANTLR errors using ANTLRWorks
Solving ANTLR grammar errors can be very difficult, especially in complex grammar files.
Below is a simple example, based on the GQL ANTLR-grammar used in FxGqlC.
(reduced to illustrate the problem. A complete grammar can be found here. GQL is a domain language similar to SQL / T-SQL)
grammar sql;
select_command
: SELECT (WS top_clause)? WS column_list EOF
;
top_clause
: TOP expression
;
column_list
: expression (WS? ',' WS? expression)*
;
expression
: expression_3
;
expression_3
: expression_2 (WS? op_3 WS? expression_2)*
;
op_3 : '+' | '-' | '&' | '|' | '^'
;
expression_2
: expression_1 (WS? op_2 WS? expression_1)*
;
op_2 : '*' | '/' | '%'
;
expression_1
: op_1 WS? expression_1
| expression_atom
;
op_1 : '~' | '+' | '-'
;
expression_atom
: NUMBER
| '(' WS? expression WS? ')'
;
SELECT : 'select' ;
TOP : 'top' ;
NUMBER : DIGIT+;
WS
: (' '|'\t'|'\n'|'\r'|'\u000C')+
;
fragment DIGIT : '0'..'9';
The 3 expression "levels" are used to handle operator precedence. The grammar is designed to be able to parse expressions like:
- SELECT 17
- SELECT 17 * 14 + 3
- SELECT 17 + 14 + 3
- SELECT - 17
- SELECT 17 * - 14 + 3
- SELECT 17 + 14 + - 3
- SELECT TOP 3 17
- ...
When trying to 'compile' or 'Interpret' the grammar in ANTLRWorks, you get this error:
[11:36:44] error(211): <notsaved>:21:43: [fatal] rule expression_3 has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
[11:36:44] warning(200): <notsaved>:21:43:
Decision can match input such as "WS {'+', '-'} WS NUMBER" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
Solving this error just by analyzing the grammar is quite a challenge, even for this very simple example. When using a large grammar file it is nearly impossible.
But ANTLRWorks has a very useful tool to show what's going wrong.
- The error message indicates that there is a problem with expression_3 (expression_3 is also indicated in red in the list of rules/tokens in the left pane).
- Put your cursor in expression_3, and select the tab "Syntax Diagram" in the lower pane.
- First, in the lower pane, select "Alternatives '1'" in the upper right corner.
==> In green you see how the grammar matches "WS '+' WS NUMBER", which is exactly what we want. - Next, select "Alternatives '2'" in the upper right corner.
==> In red you see how the grammar matches "WS '+' WS NUMBER". - In the latter case, you can see that the matching starts in the TOP-clause.
This is what's happening: there can be an ambiguity when parsing "SELECT TOP 1 + 2 + 20".
It is not clear where the top-clause ends and the column-list starts. Both '+' signs can be unary or binary.
- It can be: "SELECT [TOP 1] [+ 2 + 20]", being equivalent to "SELECT TOP 1 22"
- Or it can be: "SELECT [TOP 1 + 2] [+ 20]", being equivalent to "SELECT TOP 3 20"
This ambiguity must be resolved, because only one interpretation should be valid.
In this specific case, the grammar could be changed in a way that the top-clause expression should always have parentheses surrounding it when it is not a simple number.
This can easily be achieved by changing:
top_clauseto:
: TOP expression
;
top_clause : TOP expression_atom ;
This solves the ambiquity. The text "SELECT TOP 1 + 2 + 20" is now parsed as "SELECT [TOP 1] [+ 2 + 20]".
And if somebody wants to use "1 + 2" in the TOP-clause, he should use: "SELECT TOP (1 + 2) + 20", which is parsed as: "SELECT [TOP (1 + 2)] [+ 20]"
Below you find the complete example, with the TOP-clause corrected:
grammar sql;
select_command
: SELECT (WS top_clause)? WS column_list EOF
;
top_clause
: TOP expression_atom
;
column_list
: expression (WS? ',' WS? expression)*
;
expression
: expression_3
;
expression_3
: expression_2 (WS? op_3 WS? expression_2)*
;
op_3 : '+' | '-' | '&' | '|' | '^'
;
expression_2
: expression_1 (WS? op_2 WS? expression_1)*
;
op_2 : '*' | '/' | '%'
;
expression_1
: op_1 WS? expression_1
| expression_atom
;
op_1 : '~' | '+' | '-'
;
expression_atom
: NUMBER
| '(' WS? expression WS? ')'
;
SELECT : 'select' ;
TOP : 'top' ;
NUMBER : DIGIT+;
WS
: (' '|'\t'|'\n'|'\r'|'\u000C')+
;
fragment DIGIT : '0'..'9';
quinta-feira, 23 de agosto de 2012
Better table search through Machine Learning and Knowledge
Posted By Johnny Chen, Product Manager, Google Research
The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of Table Search from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is a support vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].
We are also able to achieve a better understanding of the tables by leveraging the Knowledge Graph. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have. This knowledge not only helps our classifier make a better decision on the quality of the table, but also enables better matching of the table to the user query.
Finally, you will notice that we added an easy way for our users to import Web tables found through Table Search into their Google Drive account as Fusion Tables. Now that we can better identify good tables, the import feature enables our users to further explore the data. Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the Fusion Tables API.
These enhancements are just the start. We are continually updating the quality of our Table Search and adding features to it.
Stay tuned for more from Boulos Harb, Afshin Rostamizadeh, Fei Wu, Cong Yu and the rest of the Structured Data Team.
[1] Algorithms for Learning Kernels Based on Centered Alignment
[2] Generalization Bounds for Learning Kernels
The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of Table Search from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is a support vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].
We are also able to achieve a better understanding of the tables by leveraging the Knowledge Graph. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have. This knowledge not only helps our classifier make a better decision on the quality of the table, but also enables better matching of the table to the user query.
Finally, you will notice that we added an easy way for our users to import Web tables found through Table Search into their Google Drive account as Fusion Tables. Now that we can better identify good tables, the import feature enables our users to further explore the data. Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the Fusion Tables API.
These enhancements are just the start. We are continually updating the quality of our Table Search and adding features to it.
Stay tuned for more from Boulos Harb, Afshin Rostamizadeh, Fei Wu, Cong Yu and the rest of the Structured Data Team.
[1] Algorithms for Learning Kernels Based on Centered Alignment
[2] Generalization Bounds for Learning Kernels
quarta-feira, 22 de agosto de 2012
Machine Learning Book for Students and Researchers
Posted by Afshin Rostamizadeh, Google Research
Our machine learning book, The Foundations of Machine Learning, is now published! The book, with authors from both Google Research and academia, covers a large variety of fundamental machine learning topics in depth, including the theoretical basis of many learning algorithms and key aspects of their applications. The material presented takes its origin in a machine learning graduate course, "Foundations of Machine Learning", taught by Mehryar Mohri over the past seven years and has considerably benefited from comments and suggestions from students and colleagues at Google.
The book can serve as a textbook for both graduate students and advanced undergraduate students and a reference manual for researchers in machine learning, statistics, and many other related areas. It includes as a supplement introductory material to topics such as linear algebra and optimization and other useful conceptual tools, as well as a large number of exercises at the end of each chapter whose full solutions are provided online.
Our machine learning book, The Foundations of Machine Learning, is now published! The book, with authors from both Google Research and academia, covers a large variety of fundamental machine learning topics in depth, including the theoretical basis of many learning algorithms and key aspects of their applications. The material presented takes its origin in a machine learning graduate course, "Foundations of Machine Learning", taught by Mehryar Mohri over the past seven years and has considerably benefited from comments and suggestions from students and colleagues at Google.
The book can serve as a textbook for both graduate students and advanced undergraduate students and a reference manual for researchers in machine learning, statistics, and many other related areas. It includes as a supplement introductory material to topics such as linear algebra and optimization and other useful conceptual tools, as well as a large number of exercises at the end of each chapter whose full solutions are provided online.
terça-feira, 21 de agosto de 2012
FxGqlC v2.3
A new version of FxGqlC has been released.
The major changes are documented here:
https://sites.google.com/site/fxgqlc/home/fxgqlc-manual/changes-in-fxgqlc-2-3
You can download FxGqlC v2.3 here:
https://sites.google.com/site/fxgqlc/home/downloads
The major changes are documented here:
https://sites.google.com/site/fxgqlc/home/fxgqlc-manual/changes-in-fxgqlc-2-3
You can download FxGqlC v2.3 here:
https://sites.google.com/site/fxgqlc/home/downloads
segunda-feira, 20 de agosto de 2012
Faculty Summit 2012: Online Education Panel
Posted by Peter Norvig, Director of Research
On July 26th, Google's 2012 Faculty Summit hosted computer science professors from around the world for a chance to talk and hear about some of the work done by Google and by our faculty partners. One of the sessions was a panel on Online Education. Daphne Koller's presentation on "Education at Scale" describes how a talk about YouTube at the 2009 Google Faculty Summit was an early inspiration for her, as she was formulating her approach that led to the founding of Coursera. Koller started with the goal of allowing Stanford professors to have more time for meaningful interaction with their students, rather than just lecturing, and ended up with a model based on the flipped classroom, where students watch videos out of class, and then come together to discuss what they have learned. She then refined the flipped classroom to work when there is no classroom, when the interactions occur in online discussion forums rather than in person. She described some fascinating experiments that allow for more flexible types of questions (beyond multiple choice and fill-in-the-blank) by using peer grading of exercises.
In my talk, I describe how I arrived at a similar approach but starting with a different motivation: I wanted a textbook that was more interactive and engaging than a static paper-based book, so I too incorporated short videos and frequent interactions for the Intro to AI class I taught with Sebastian Thrun.
Finally, Bradley Horowitz, Vice President of Product Management for Google+ gave a talk describing the goals of Google+. It is not to build the largest social network; rather it is to understand our users better, so that we can serve them better, while respecting their privacy, and keeping each of their conversations within the appropriate circle of friends. This allows people to have more meaningful conversations, within a limited context, and turns out to be very appropriate to education.
By bringing people together at events like the Faculty Summit, we hope to spark the conversations and ideas that will lead to the next breakthroughs, perhaps in online education, or perhaps in other fields. We'll find out a few years from now what ideas took root at this year's Summit.
On July 26th, Google's 2012 Faculty Summit hosted computer science professors from around the world for a chance to talk and hear about some of the work done by Google and by our faculty partners. One of the sessions was a panel on Online Education. Daphne Koller's presentation on "Education at Scale" describes how a talk about YouTube at the 2009 Google Faculty Summit was an early inspiration for her, as she was formulating her approach that led to the founding of Coursera. Koller started with the goal of allowing Stanford professors to have more time for meaningful interaction with their students, rather than just lecturing, and ended up with a model based on the flipped classroom, where students watch videos out of class, and then come together to discuss what they have learned. She then refined the flipped classroom to work when there is no classroom, when the interactions occur in online discussion forums rather than in person. She described some fascinating experiments that allow for more flexible types of questions (beyond multiple choice and fill-in-the-blank) by using peer grading of exercises.
In my talk, I describe how I arrived at a similar approach but starting with a different motivation: I wanted a textbook that was more interactive and engaging than a static paper-based book, so I too incorporated short videos and frequent interactions for the Intro to AI class I taught with Sebastian Thrun.
Finally, Bradley Horowitz, Vice President of Product Management for Google+ gave a talk describing the goals of Google+. It is not to build the largest social network; rather it is to understand our users better, so that we can serve them better, while respecting their privacy, and keeping each of their conversations within the appropriate circle of friends. This allows people to have more meaningful conversations, within a limited context, and turns out to be very appropriate to education.
By bringing people together at events like the Faculty Summit, we hope to spark the conversations and ideas that will lead to the next breakthroughs, perhaps in online education, or perhaps in other fields. We'll find out a few years from now what ideas took root at this year's Summit.
sábado, 18 de agosto de 2012
Regular expression matching in C++11
A part of the boost (http://www.boost.org/) functionality regarding regular expressions has been included in the new C++11/C++0x standard.
This code works on GCC and even on Visual C++ 10 (Visual Studio 2010) and above:
std::regex rgx("(\S+@\S+)");
std::smatch result;
std::string str = std::regex_replace(std::string("please send an email to my@mail.com for more information"), rgx, std::string("<$1>"));
// str contains the same text, but with the e-mail address enclosed between <...>.
More information on regular expressions can be found here: http://www.regular-expressions.info/ .
sexta-feira, 17 de agosto de 2012
Replace all occurrences of a character in a std::string with another character
in one line of C++ code, using C++11/C++0x:
std::string str = "my#string";
std::for_each(str.begin(), str.end(), [] (char &ch) { if (ch == '#') ch = '\\'; } );
// str now contains my\string
The for_each function code calls the lambda expression (indicated in yellow) for every character, and the lambda expression replaces the '#' with a '\'
terça-feira, 14 de agosto de 2012
The future of technology?
Click the image to make it bigger:
Source: http://envisioningtech.com/
2012 2013 2014 2015 2016 2017 2018 2019 2020 2030 2040 2012 2013 2014 2015 2016 2017 2019 2020 2030 2040 ROBOTICS BIOTECH MATERIALS ENERGY ARTIFICIAL INTELLIGENCE SENSORS GEOENGINEERING QUANTITATIVE FORECASTS INTERNET INTERFACES UBICOMP SPACE BITS ATOMS RELATIVE IMPORTANCE CONSUMER IMPACT CLUSTER OF TECHNOLOGIES The node size indicates the predicted importance of a technology. The outline of a node indicates a consumer impact larger than the technological novelty. A jagged outline indicates a cluster of similar technologies grouped together. World population: 8 billion Source: U.N. – http://bit.ly/7nqQkS World population: 7 billion BRICs GDP overtakes the G7 Source: Goldman Sachs – http://bit.ly/nc9Wqj Petabyte storage standard Source: http://bit.ly/r9BYQc Exabyte storage standard Source: http://bit.ly/kPMKMb Terabit internet speed standard Source: http://bit.ly/kPMKMb World population: 9 billion Source: U.N. – http://bit.ly/7nqQkS Source: http://bit.ly/6MoQJc Sources: Intel – http://intel.ly/pWbH04 Ericsson – http://bit.ly/avvVok Alan Conroy – http://bit.ly/pofHp5 FutureTimeline – http://bit.ly/qz4ben Sources: Intel – http://intel.ly/pWbH04 InternetWorldStats – http://bit.ly/AKbO5 Source: U.N. – http://bit.ly/7nqQkS Global online population: ± 2 billion Connected devices: ±10 billion Global online population: 4-5 billion Connected devices: 30-50 billion $150 Hard disk: ±200 Tb Standard RAM: ±750Gb Global online population: ± 2.5 billion Connected devices: ±15 billion $ 1.000 computer reaches the capacity of the human brain (± 10 15 calculations per second) Vertical farming Weather engineering Seasteading Desalination Carbon sequestration Climate engineering Arcologies Commercial spaceflight Sub-orbital spaceflight Lunar outpost Mars mission Solar sail Space elevator Space tourism Inductive chargers Thorium reactor Traveling wave reactor Fuel cells Multi-segmented smart grids Biomechanical harvesting Bio-enhanced fuels Artificial photosynthesis Space-based solar power Piezoelectricity Photovoltaic glass Nanogenerators Enernet Tidal turbines Programmable matter Personal fabricators Molecular assembler Metamaterials Additive manufacturing Graphene Optical invisibility cloaks Biomaterials Carbon nanotubes Self-healing materials Nanowires Antiaging drugs Stem-cell treatments In-vitro meat Nanomedicine Artificial retinas Rapid personal gene sequencing Synthetic biology Personalized medicine Gene therapy Hybrid assisted limbs Smart drugs Synthetic blood Organ printing Smart toys Robotic surgery Telematics Appliance robots Self-driving vehicles Domestic robots Powered exoskeleton Embodied avatars Swarm robotics Utility fog Commercial UAVs Fabric-embedded screens Reprogrammable chips Picoprojectors Volumetric (3D) screens Flexible screens Skin-embedded screens Modular computers Tablets Boards Retinal screens Eyewear-embedded screens Context-aware computing Smart power meters Biometric sensors Machine vision Optogenetics Depth imaging Biomarkers Neuroinformatics Near-field communication Pervasive video capture Computational photography Speech recognition Haptics 4K Augmented reality Gesture recognition Multi touch Immersive virtual reality Holography Telepresence 4G 5G Cloud computing Interplanetary internet Exocortex Photonics Virtual currencies Cyberwarfare Mesh networking Reputation economy Remote presence VR-only lifeforms Machineaugmented cognition Software agents High-frequency trading Natural language interpretation Procedural storytelling Machine translation Research & visualization by Michell Zappa mz@envisioningtech mz@envisioningtech.com mz@envisioningtech.com Envisioning emerging technology for 2012 and beyond Last updated: 2012-02-10 Understanding where technology is heading is more than guesswork. Looking at emerging trends and research, one can predict and draw conclusions about how the technological sphere is developing, and which technologies should become mainstream in the coming years. Envisioning technology is meant to facilitate these observations by taking a step back and seeing the wider context. By speculating about what lies beyond the horizon we can make better decisions of what to create today. BY SA
Improving Google Patents with European Patent Office patents and the Prior Art Finder
Posted by Jon Orwant, Engineering Manager
Cross-posted with the US Public Policy Blog, the European Public Policy Blog, and Inside Search Blog
At Google, we're constantly trying to make important collections of information more useful to the world. Since 2006, we’ve let people discover, search, and read United States patents online. Starting this week, you can do the same for the millions of ideas that have been submitted to the European Patent Office, such as this one.
Typically, patents are granted only if an invention is new and not obvious. To explain why an invention is new, inventors will usually cite prior art such as earlier patent applications or journal articles. Determining the novelty of a patent can be difficult, requiring a laborious search through many sources, and so we’ve built a Prior Art Finder to make this process easier. With a single click, it searches multiple sources for related content that existed at the time the patent was filed.
Patent pages now feature a “Find prior art” button that instantly pulls together information relevant to the patent application.
The Prior Art Finder identifies key phrases from the text of the patent, combines them into a search query, and displays relevant results from Google Patents, Google Scholar, Google Books, and the rest of the web. You’ll start to see the blue “Find prior art” button on individual patent pages starting today.
Our hope is that this tool will give patent searchers another way to discover information relevant to a patent application, supplementing the search techniques they use today. We’ll be refining and extending the Prior Art Finder as we develop a better understanding of how to analyze patent claims and how to integrate the results into the workflow of patent searchers.
These are small steps toward making this collection of important but complex documents better understood. Sometimes language can be a barrier to understanding, which is why earlier this year we released an update to Google Translate that incorporates the European Patent Office’s parallel patent texts, allowing the EPO to provide translation between English, French, German, Spanish, Italian, Portuguese, and Swedish, with more languages scheduled for the future. And with the help of the United States Patent & Trademark Office, we’ve continued to add to our repository of USPTO bulk data, making it easier for researchers and law firms to analyze the entire corpus of US patents. More to come!
Cross-posted with the US Public Policy Blog, the European Public Policy Blog, and Inside Search Blog
At Google, we're constantly trying to make important collections of information more useful to the world. Since 2006, we’ve let people discover, search, and read United States patents online. Starting this week, you can do the same for the millions of ideas that have been submitted to the European Patent Office, such as this one.
Typically, patents are granted only if an invention is new and not obvious. To explain why an invention is new, inventors will usually cite prior art such as earlier patent applications or journal articles. Determining the novelty of a patent can be difficult, requiring a laborious search through many sources, and so we’ve built a Prior Art Finder to make this process easier. With a single click, it searches multiple sources for related content that existed at the time the patent was filed.
Patent pages now feature a “Find prior art” button that instantly pulls together information relevant to the patent application.
The Prior Art Finder identifies key phrases from the text of the patent, combines them into a search query, and displays relevant results from Google Patents, Google Scholar, Google Books, and the rest of the web. You’ll start to see the blue “Find prior art” button on individual patent pages starting today.
Our hope is that this tool will give patent searchers another way to discover information relevant to a patent application, supplementing the search techniques they use today. We’ll be refining and extending the Prior Art Finder as we develop a better understanding of how to analyze patent claims and how to integrate the results into the workflow of patent searchers.
These are small steps toward making this collection of important but complex documents better understood. Sometimes language can be a barrier to understanding, which is why earlier this year we released an update to Google Translate that incorporates the European Patent Office’s parallel patent texts, allowing the EPO to provide translation between English, French, German, Spanish, Italian, Portuguese, and Swedish, with more languages scheduled for the future. And with the help of the United States Patent & Trademark Office, we’ve continued to add to our repository of USPTO bulk data, making it easier for researchers and law firms to analyze the entire corpus of US patents. More to come!
sexta-feira, 10 de agosto de 2012
Export of Office Outlook contacts to GMail
To import your Microsoft Office Outlook contacts to GMail or Google Apps, you need to export them first to a CSV file.
- In Outlook, go to the "File" tab in the ribbon menu, and click "Options" in the left sidebar.
- In the Outlook Options dialog, click on "Advanced" in the sidebar, and click the "Export" button.
- In the first step of the Import and Export wizard, select "Export to a file", and click "Next".
- In the second step, select "Comma Separated Values (Windows)", and click "Next".
- In the third step, select your Contacts folder that you want to export (normally "Contacts"), and click "Next".
- In the fourth step, enter or select the filename, e.g. "contacts.csv".
- Click "Finish" to start the export.
When you import this file in GMail, and you are a member of a Windows Active Directory domain, the e-mail addresses are not imported. Instead, the e-mail address field in GMail contains the "distinguished name" of your contact as known to your ActiveDirectory. E.g. "cn=jsmith,ou=promotions,ou=marketing,dc=noam,dc=reskit,dc=com".
The real e-mail address is however included in the CSV file, as part of the column "E-mail Display Name", which contains the full name and the regular e-mail address between parentheses, but this column isn't used by the GMail import.
You could replace all E-Mail Addresses in the file using an Excel formula, or manually in a text-editor.
Or you can simply use this FxGqlC command to replace all e-mail address columns with the e-mail address taken from the display name:
select replaceregex($line, '\"/o=.*?\",\"EX\",(\".*?\((.*?)\)\")', '"$2","EX",$1') into [contacts2.csv] from [contacts.csv]
The same method can be used to replace national telephone numbers into an international format:
select replaceregex($line, '\+?(32\d{8,9})', '+$1') into [meucci3.csv] from [meucci2.csv]
You need to adopt the regular expression to a format appropriate for your contacts.
Import the resulting file in GMail, and that's it.
Assinar:
Postagens (Atom)