Google Research: agosto 2012

quinta-feira, 30 de agosto de 2012

Uploading a file to Amazon Glacier using PowerShell

For my own convenience, I created a PowerShell module to upload backup files to Amazon's new backup service "Glacier".

The module is contained in this file:
http://dl.dropbox.com/u/2350654/blog/FxAWS.zip
(including the Amazon AWS SDK library)
Extract it into your Powershell module directory, e.g.
C:\Users\<your name>\Documents\WindowsPowerShell\Modules.
(this will create a directory FxAWS under the directory Modules)

Start powershell and run:
Import-Module FxAWS

C:\Users\wim devos.GENOFFICE> Write-AWSGlacier -AWSAccessKey '<access key>' -AWSSecretKey '<secret key>' -AWSRegion <some region> -
GlacierVault <vault name> -Filename <filename> -Description <description>

e.g.

Write-AWSGlacier -AWSAccessKey '[your access key]' -AWSSecretKey ' [your secret key] ' -AWSRegion us-east-1 -GlacierVault "backup" -Filename "backup-20120830.7z"

The parameters are:

AWSAccessKey and AWSSecretKey.
These are NOT your login and password to log on to the Amazon.com web site.
You can find the access key and secret key in your Amazon.com account:
https://portal.aws.amazon.com/gp/aws/securityCredentials
Extra credentials can be created and they can be removed individually.
AWSRegion.
The region where your Glacier Vault has been created.
GlacierVault
The name of the Vault that you created in the Glacier administration website.
FileName
The file that you want to upload.
Description (optional)
A description used in the Glacier administration website. When none is specified, the last part of the filename path is used.

You can create 3 (global) PowerShell variabled $AWSAccessKey_Default, $AWSSecretKey_Default and $AWSRegion_Default, in which case you can omit the -AWS* parameters.

GIT External Diff / Merge tools

Occording to the GIT book, GIT supports these external Diff / Merge tools:

[araxis] Araxis Merge - Commercial
[bc3] Beyond Compare 3 - Commercial
[diffuse] Diffuse - Open source
[ecmerge] ECMerge - Commercial
[emerge] Emerge (Emacs) - Open source
[gvimdiff] gvimdiff - Open source
[kdiff3] KDiff3 - Open source
[meld] Meld merge - Open source
[opendiff] opendiff - OS X Developer Tools
[p4merge] P4Merge - Commercial
[tkdiff] TkDiff - Open source
[tortoisemerge] TortoiseMerge - Open source - [ Merge tool only ]
[vimdiff] vimdiff - Open source
[xxdiff] xxdiff - Open source
[kompare] Kompare - Open source - [ Diff tool only ]

You can choose your prefered diff / merge tool by executing this command (Windows or Linux), in this case for my favorite kdiff3.

git config --global diff.tool kdiff3

git config --global merge.tool kdiff3

The external diff is started using this command:

git difftool

You will probably get the error "The diff tool kdiff3 is not available as 'kdiff3'. external diff died, stopping at <filename>.", because GIT doesn't find your tool.
This can be resolved by adding your tool to the environment PATH variable, or by telling GIT where to find your tool(s):

git config --global difftool.kdiff3.path "C:/Program Files (x86)/KDiff3/kdiff3.exe"

git config --global mergetool.kdiff3.path "C:/Program Files (x86)/KDiff3/kdiff3.exe"

Any configuration can be undone by executing:

git config --global <config item> --unset

e.g.

git config --global mergetool.kdiff3.path --unset

quarta-feira, 29 de agosto de 2012

Users love simple and familiar designs – Why websites need to make a great first impression

Posted by Javier Bargas-Avila, Senior User Experience Researcher at YouTube UX Research

I’m sure you’ve experienced this at some point: You click on a link to a website, and after a quick glance you already know you’re not interested, so you click ‘back’ and head elsewhere. How did you make that snap judgment? Did you really read and process enough information to know that this website wasn’t what you were looking for? Or was it something more immediate?

We form first impressions of the people and things we encounter in our daily lives in an extraordinarily short timeframe. We know the first impression a website’s design creates is crucial in capturing users’ interest. In less than 50 milliseconds, users build an initial “gut feeling” that helps them decide whether they’ll stay or leave. This first impression depends on many factors: structure, colors, spacing, symmetry, amount of text, fonts, and more.

In our study we investigated how users' first impressions of websites are influenced by two design factors:

Visual complexity -- how complex the visual design of a website looks
Prototypicality -- how representative a design looks for a certain category of websites

We presented screenshots of existing websites that varied in both of these factors -- visual complexity and prototypicality -- and asked users to rate their beauty.

The results show that both visual complexity and prototypicality play crucial roles in the process of forming an aesthetic judgment. It happens within incredibly short timeframes between 17 and 50 milliseconds. By comparison, the average blink of an eye takes 100 to 400 milliseconds.

And these two factors are interrelated: if the visual complexity of a website is high, users perceive it as less beautiful, even if the design is familiar. And if the design is unfamiliar -- i.e., the site has low prototypicality -- users judge it as uglier, even if it’s simple.

In other words, users strongly prefer website designs that look both simple (low complexity) and familiar (high prototypicality). That means if you’re designing a website, you’ll want to consider both factors. Designs that contradict what users typically expect of a website may hurt users’ first impression and damage their expectations. Recent research shows that negative product expectations lead to lower satisfaction in product interaction -- a downward spiral you’ll want to avoid. Go for simple and familiar if you want to appeal to your users’ sense of beauty.

terça-feira, 28 de agosto de 2012

Google at UAI 2012

Posted by Kevin Murphy, Research Scientist

The conference on Uncertainty in Artificial Intelligence (UAI) is one of the premier venues for research related to probabilistic models and reasoning under uncertainty. This year's conference (the 28th) set several new records: the largest number of submissions (304 papers, last year 285), the largest number of participants (216, last year 191), the largest number of tutorials (4, last year 3), and the largest number of workshops (4, last year 1). We interpret this as a sign that the conference is growing, perhaps as part of the larger trend of increasing interest in machine learning and data analysis.

There were many interesting presentations. A couple of my favorites included:

"Video In Sentences Out," by Andrei Barbu et al. This demonstrated an impressive system that is able to create grammatically correct sentences describing the objects and actions occurring in a variety of different videos.
"Exploiting Compositionality to Explore a Large Space of Model Structures," by Roger Grosse et al. This paper (which won the Best Student Paper Award) proposed a way to view many different latent variable models for matrix decomposition - including PCA, ICA, NMF, Co-Clustering, etc. - as special cases of a general grammar. The paper then showed ways to automatically select the right kind of model for a dataset by performing greedy search over grammar productions, combined with Bayesian inference for model fitting.

A strong theme this year was causality. In fact, we had an invited talk on the topic by Judea Pearl, winner of the 2011 Turing Award, in addition to a one-day workshop. Although causality is sometimes regarded as something of an academic curiosity, its relevance to important practical problems (e.g., to medicine, advertising, social policy, etc.) is becoming more clear. There is still a large gap between theory and practice when it comes to making causal predictions, but it was pleasing to see that researchers in the UAI community are making steady progress on this problem.

There were two presentations at UAI by Googlers. The first, "Latent Structured Ranking," by Jason Weston and John Blitzer, described an extension to a ranking model called Wsabie, that was published at ICML in 2011, and is widely used within Google. The Wsabie model embeds a pair of items (say a query and a document) into a low dimensional space, and uses distance in that space as a measure of semantic similarity. The UAI paper extends this to the setting where there are multiple candidate documents in response to a given query. In such a context, we can get improved performance by leveraging similarities between documents in the set.

The second paper by Googlers, "Hokusai - Sketching Streams in Real Time," was presented by Sergiy Matusevych, Alex Smola and Amr Ahmed. (Amr recently joined Google from Yahoo, and Alex is a visiting faculty member at Google.) This paper extends the Count-Min sketch method for storing approximate counts to the streaming context. This extension allows one to compute approximate counts of events (such as the number of visitors to a particular website) aggregated over different temporal extents. The method can also be extended to store approximate n-gram statistics in a very compact way.

In addition to these presentations, Google was involved in UAI in several other ways: I held a program co-chair position on the organizing committee, several of the referees and attendees work at Google, and Google provided some sponsorship for the conference.

Overall, this was a very successful conference, in an idyllic setting (Catalina Island, an hour off the coast of Los Angeles). We believe UAI and its techniques will grow in importance as various organizations -- including Google -- start combining structured, prior knowledge with raw, noisy unstructured data.

segunda-feira, 27 de agosto de 2012

Solving ANTLR errors using ANTLRWorks

Solving ANTLR grammar errors can be very difficult, especially in complex grammar files.

Below is a simple example, based on the GQL ANTLR-grammar used in FxGqlC.
(reduced to illustrate the problem. A complete grammar can be found here. GQL is a domain language similar to SQL / T-SQL)

grammar sql;
select_command
: SELECT (WS top_clause)? WS column_list EOF
;
top_clause
: TOP expression
;
column_list
: expression (WS? ',' WS? expression)*
;
expression
: expression_3
;
expression_3
: expression_2 (WS? op_3 WS? expression_2)*
;
op_3 : '+' | '-' | '&' | '|' | '^'
;
expression_2
: expression_1 (WS? op_2 WS? expression_1)*
;
op_2 : '*' | '/' | '%'
;
expression_1
: op_1 WS? expression_1
| expression_atom
;
op_1 : '~' | '+' | '-'
;
expression_atom
: NUMBER
| '(' WS? expression WS? ')'
;
SELECT : 'select' ;
TOP : 'top' ;
NUMBER : DIGIT+;
WS
: (' '|'\t'|'\n'|'\r'|'\u000C')+
;

fragment DIGIT : '0'..'9';

The 3 expression "levels" are used to handle operator precedence. The grammar is designed to be able to parse expressions like:

SELECT 17
SELECT 17 * 14 + 3
SELECT 17 + 14 + 3
SELECT - 17
SELECT 17 * - 14 + 3
SELECT 17 + 14 + - 3
SELECT TOP 3 17
...

When trying to 'compile' or 'Interpret' the grammar in ANTLRWorks, you get this error:

[11:36:44] error(211): <notsaved>:21:43: [fatal] rule expression_3 has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
[11:36:44] warning(200): <notsaved>:21:43:
Decision can match input such as "WS {'+', '-'} WS NUMBER" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

Solving this error just by analyzing the grammar is quite a challenge, even for this very simple example. When using a large grammar file it is nearly impossible.
But ANTLRWorks has a very useful tool to show what's going wrong.

The error message indicates that there is a problem with expression_3 (expression_3 is also indicated in red in the list of rules/tokens in the left pane).
Put your cursor in expression_3, and select the tab "Syntax Diagram" in the lower pane.
First, in the lower pane, select "Alternatives '1'" in the upper right corner.
==> In green you see how the grammar matches "WS '+' WS NUMBER", which is exactly what we want.
Next, select "Alternatives '2'" in the upper right corner.
==> In red you see how the grammar matches "WS '+' WS NUMBER".
In the latter case, you can see that the matching starts in the TOP-clause.

This is what's happening: there can be an ambiguity when parsing "SELECT TOP 1 + 2 + 20".
It is not clear where the top-clause ends and the column-list starts. Both '+' signs can be unary or binary.

It can be: "SELECT [TOP 1] [+ 2 + 20]", being equivalent to "SELECT TOP 1 22"
Or it can be: "SELECT [TOP 1 + 2] [+ 20]", being equivalent to "SELECT TOP 3 20"

This ambiguity must be resolved, because only one interpretation should be valid.
In this specific case, the grammar could be changed in a way that the top-clause expression should always have parentheses surrounding it when it is not a simple number.
This can easily be achieved by changing:

top_clause
: TOP expression
;

to:

top_clause : TOP expression_atom ;

This solves the ambiquity. The text "SELECT TOP 1 + 2 + 20" is now parsed as "SELECT [TOP 1] [+ 2 + 20]".
And if somebody wants to use "1 + 2" in the TOP-clause, he should use: "SELECT TOP (1 + 2) + 20", which is parsed as: "SELECT [TOP (1 + 2)] [+ 20]"

Below you find the complete example, with the TOP-clause corrected:

grammar sql;
select_command
: SELECT (WS top_clause)? WS column_list EOF
;
top_clause
: TOP expression_atom
;
column_list
: expression (WS? ',' WS? expression)*
;

expression
: expression_3
;
expression_3
: expression_2 (WS? op_3 WS? expression_2)*
;
op_3 : '+' | '-' | '&' | '|' | '^'
;
expression_2
: expression_1 (WS? op_2 WS? expression_1)*
;
op_2 : '*' | '/' | '%'
;
expression_1
: op_1 WS? expression_1
| expression_atom
;
op_1 : '~' | '+' | '-'
;
expression_atom
: NUMBER
| '(' WS? expression WS? ')'
;
SELECT : 'select' ;
TOP : 'top' ;
NUMBER : DIGIT+;
WS
: (' '|'\t'|'\n'|'\r'|'\u000C')+
;

fragment DIGIT : '0'..'9';

quinta-feira, 23 de agosto de 2012

Better table search through Machine Learning and Knowledge

Posted By Johnny Chen, Product Manager, Google Research

The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of Table Search from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is a support vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].

We are also able to achieve a better understanding of the tables by leveraging the Knowledge Graph. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have. This knowledge not only helps our classifier make a better decision on the quality of the table, but also enables better matching of the table to the user query.

Finally, you will notice that we added an easy way for our users to import Web tables found through Table Search into their Google Drive account as Fusion Tables. Now that we can better identify good tables, the import feature enables our users to further explore the data. Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the Fusion Tables API.

These enhancements are just the start. We are continually updating the quality of our Table Search and adding features to it.

Stay tuned for more from Boulos Harb, Afshin Rostamizadeh, Fei Wu, Cong Yu and the rest of the Structured Data Team.

[1] Algorithms for Learning Kernels Based on Centered Alignment
[2] Generalization Bounds for Learning Kernels

quarta-feira, 22 de agosto de 2012

Machine Learning Book for Students and Researchers

Posted by Afshin Rostamizadeh, Google Research

Our machine learning book, The Foundations of Machine Learning, is now published! The book, with authors from both Google Research and academia, covers a large variety of fundamental machine learning topics in depth, including the theoretical basis of many learning algorithms and key aspects of their applications. The material presented takes its origin in a machine learning graduate course, "Foundations of Machine Learning", taught by Mehryar Mohri over the past seven years and has considerably benefited from comments and suggestions from students and colleagues at Google.

The book can serve as a textbook for both graduate students and advanced undergraduate students and a reference manual for researchers in machine learning, statistics, and many other related areas. It includes as a supplement introductory material to topics such as linear algebra and optimization and other useful conceptual tools, as well as a large number of exercises at the end of each chapter whose full solutions are provided online.

terça-feira, 21 de agosto de 2012

FxGqlC v2.3

A new version of FxGqlC has been released.

The major changes are documented here:
https://sites.google.com/site/fxgqlc/home/fxgqlc-manual/changes-in-fxgqlc-2-3

You can download FxGqlC v2.3 here:
https://sites.google.com/site/fxgqlc/home/downloads

segunda-feira, 20 de agosto de 2012

Faculty Summit 2012: Online Education Panel

Posted by Peter Norvig, Director of Research

On July 26th, Google's 2012 Faculty Summit hosted computer science professors from around the world for a chance to talk and hear about some of the work done by Google and by our faculty partners. One of the sessions was a panel on Online Education. Daphne Koller's presentation on "Education at Scale" describes how a talk about YouTube at the 2009 Google Faculty Summit was an early inspiration for her, as she was formulating her approach that led to the founding of Coursera. Koller started with the goal of allowing Stanford professors to have more time for meaningful interaction with their students, rather than just lecturing, and ended up with a model based on the flipped classroom, where students watch videos out of class, and then come together to discuss what they have learned. She then refined the flipped classroom to work when there is no classroom, when the interactions occur in online discussion forums rather than in person. She described some fascinating experiments that allow for more flexible types of questions (beyond multiple choice and fill-in-the-blank) by using peer grading of exercises.

In my talk, I describe how I arrived at a similar approach but starting with a different motivation: I wanted a textbook that was more interactive and engaging than a static paper-based book, so I too incorporated short videos and frequent interactions for the Intro to AI class I taught with Sebastian Thrun.

Finally, Bradley Horowitz, Vice President of Product Management for Google+ gave a talk describing the goals of Google+. It is not to build the largest social network; rather it is to understand our users better, so that we can serve them better, while respecting their privacy, and keeping each of their conversations within the appropriate circle of friends. This allows people to have more meaningful conversations, within a limited context, and turns out to be very appropriate to education.

By bringing people together at events like the Faculty Summit, we hope to spark the conversations and ideas that will lead to the next breakthroughs, perhaps in online education, or perhaps in other fields. We'll find out a few years from now what ideas took root at this year's Summit.

sábado, 18 de agosto de 2012

Regular expression matching in C++11

A part of the boost (http://www.boost.org/) functionality regarding regular expressions has been included in the new C++11/C++0x standard.

This code works on GCC and even on Visual C++ 10 (Visual Studio 2010) and above:
std::regex rgx("(\S+@\S+)");

std::smatch result;
std::string str = std::regex_replace(std::string("please send an email to my@mail.com for more information"), rgx, std::string("<$1>"));
// str contains the same text, but with the e-mail address enclosed between <...>.

More information on regular expressions can be found here: http://www.regular-expressions.info/ .

sexta-feira, 17 de agosto de 2012

Replace all occurrences of a character in a std::string with another character

in one line of C++ code, using C++11/C++0x:

std::string str = "my#string";
std::for_each(str.begin(), str.end(), [] (char &ch) { if (ch == '#') ch = '\\'; } );
// str now contains my\string

The for_each function code calls the lambda expression (indicated in yellow) for every character, and the lambda expression replaces the '#' with a '\'

terça-feira, 14 de agosto de 2012

The future of technology?

Click the image to make it bigger:

Source: http://envisioningtech.com/

2012 2013 2014 2015 2016 2017 2018 2019 2020 2030 2040 2012 2013 2014 2015 2016 2017 2019 2020 2030 2040 ROBOTICS BIOTECH MATERIALS ENERGY ARTIFICIAL INTELLIGENCE SENSORS GEOENGINEERING QUANTITATIVE FORECASTS INTERNET INTERFACES UBICOMP SPACE BITS ATOMS RELATIVE IMPORTANCE CONSUMER IMPACT CLUSTER OF TECHNOLOGIES The node size indicates the predicted importance of a technology. The outline of a node indicates a consumer impact larger than the technological novelty. A jagged outline indicates a cluster of similar technologies grouped together. World population: 8 billion Source: U.N. – http://bit.ly/7nqQkS World population: 7 billion BRICs GDP overtakes the G7 Source: Goldman Sachs – http://bit.ly/nc9Wqj Petabyte storage standard Source: http://bit.ly/r9BYQc Exabyte storage standard Source: http://bit.ly/kPMKMb Terabit internet speed standard Source: http://bit.ly/kPMKMb World population: 9 billion Source: U.N. – http://bit.ly/7nqQkS Source: http://bit.ly/6MoQJc Sources: Intel – http://intel.ly/pWbH04 Ericsson – http://bit.ly/avvVok Alan Conroy – http://bit.ly/pofHp5 FutureTimeline – http://bit.ly/qz4ben Sources: Intel – http://intel.ly/pWbH04 InternetWorldStats – http://bit.ly/AKbO5 Source: U.N. – http://bit.ly/7nqQkS Global online population: ± 2 billion Connected devices: ±10 billion Global online population: 4-5 billion Connected devices: 30-50 billion $150 Hard disk: ±200 Tb Standard RAM: ±750Gb Global online population: ± 2.5 billion Connected devices: ±15 billion $ 1.000 computer reaches the capacity of the human brain (± 10 15 calculations per second) Vertical farming Weather engineering Seasteading Desalination Carbon sequestration Climate engineering Arcologies Commercial spaceﬂight Sub-orbital spaceﬂight Lunar outpost Mars mission Solar sail Space elevator Space tourism Inductive chargers Thorium reactor Traveling wave reactor Fuel cells Multi-segmented smart grids Biomechanical harvesting Bio-enhanced fuels Artiﬁcial photosynthesis Space-based solar power Piezoelectricity Photovoltaic glass Nanogenerators Enernet Tidal turbines Programmable matter Personal fabricators Molecular assembler Metamaterials Additive manufacturing Graphene Optical invisibility cloaks Biomaterials Carbon nanotubes Self-healing materials Nanowires Antiaging drugs Stem-cell treatments In-vitro meat Nanomedicine Artiﬁcial retinas Rapid personal gene sequencing Synthetic biology Personalized medicine Gene therapy Hybrid assisted limbs Smart drugs Synthetic blood Organ printing Smart toys Robotic surgery Telematics Appliance robots Self-driving vehicles Domestic robots Powered exoskeleton Embodied avatars Swarm robotics Utility fog Commercial UAVs Fabric-embedded screens Reprogrammable chips Picoprojectors Volumetric (3D) screens Flexible screens Skin-embedded screens Modular computers Tablets Boards Retinal screens Eyewear-embedded screens Context-aware computing Smart power meters Biometric sensors Machine vision Optogenetics Depth imaging Biomarkers Neuroinformatics Near-ﬁeld communication Pervasive video capture Computational photography Speech recognition Haptics 4K Augmented reality Gesture recognition Multi touch Immersive virtual reality Holography Telepresence 4G 5G Cloud computing Interplanetary internet Exocortex Photonics Virtual currencies Cyberwarfare Mesh networking Reputation economy Remote presence VR-only lifeforms Machineaugmented cognition Software agents High-frequency trading Natural language interpretation Procedural storytelling Machine translation Research & visualization by Michell Zappa mz@envisioningtech mz@envisioningtech.com mz@envisioningtech.com Envisioning emerging technology for 2012 and beyond Last updated: 2012-02-10 Understanding where technology is heading is more than guesswork. Looking at emerging trends and research, one can predict and draw conclusions about how the technological sphere is developing, and which technologies should become mainstream in the coming years. Envisioning technology is meant to facilitate these observations by taking a step back and seeing the wider context. By speculating about what lies beyond the horizon we can make better decisions of what to create today. BY SA

Improving Google Patents with European Patent Office patents and the Prior Art Finder

Posted by Jon Orwant, Engineering Manager

Cross-posted with the US Public Policy Blog, the European Public Policy Blog, and Inside Search Blog

At Google, we're constantly trying to make important collections of information more useful to the world. Since 2006, we’ve let people discover, search, and read United States patents online. Starting this week, you can do the same for the millions of ideas that have been submitted to the European Patent Office, such as this one.

Typically, patents are granted only if an invention is new and not obvious. To explain why an invention is new, inventors will usually cite prior art such as earlier patent applications or journal articles. Determining the novelty of a patent can be difficult, requiring a laborious search through many sources, and so we’ve built a Prior Art Finder to make this process easier. With a single click, it searches multiple sources for related content that existed at the time the patent was filed.

Patent pages now feature a “Find prior art” button that instantly pulls together information relevant to the patent application.

The Prior Art Finder identifies key phrases from the text of the patent, combines them into a search query, and displays relevant results from Google Patents, Google Scholar, Google Books, and the rest of the web. You’ll start to see the blue “Find prior art” button on individual patent pages starting today.

Our hope is that this tool will give patent searchers another way to discover information relevant to a patent application, supplementing the search techniques they use today. We’ll be refining and extending the Prior Art Finder as we develop a better understanding of how to analyze patent claims and how to integrate the results into the workflow of patent searchers.

These are small steps toward making this collection of important but complex documents better understood. Sometimes language can be a barrier to understanding, which is why earlier this year we released an update to Google Translate that incorporates the European Patent Office’s parallel patent texts, allowing the EPO to provide translation between English, French, German, Spanish, Italian, Portuguese, and Swedish, with more languages scheduled for the future. And with the help of the United States Patent & Trademark Office, we’ve continued to add to our repository of USPTO bulk data, making it easier for researchers and law firms to analyze the entire corpus of US patents. More to come!

sexta-feira, 10 de agosto de 2012

Export of Office Outlook contacts to GMail

To import your Microsoft Office Outlook contacts to GMail or Google Apps, you need to export them first to a CSV file.

In Outlook, go to the "File" tab in the ribbon menu, and click "Options" in the left sidebar.
In the Outlook Options dialog, click on "Advanced" in the sidebar, and click the "Export" button.
In the first step of the Import and Export wizard, select "Export to a file", and click "Next".
In the second step, select "Comma Separated Values (Windows)", and click "Next".
In the third step, select your Contacts folder that you want to export (normally "Contacts"), and click "Next".
In the fourth step, enter or select the filename, e.g. "contacts.csv".
Click "Finish" to start the export.

When you import this file in GMail, and you are a member of a Windows Active Directory domain, the e-mail addresses are not imported. Instead, the e-mail address field in GMail contains the "distinguished name" of your contact as known to your ActiveDirectory. E.g. "cn=jsmith,ou=promotions,ou=marketing,dc=noam,dc=reskit,dc=com".
The real e-mail address is however included in the CSV file, as part of the column "E-mail Display Name", which contains the full name and the regular e-mail address between parentheses, but this column isn't used by the GMail import.

You could replace all E-Mail Addresses in the file using an Excel formula, or manually in a text-editor.
Or you can simply use this FxGqlC command to replace all e-mail address columns with the e-mail address taken from the display name:

select replaceregex($line, '\"/o=.*?\",\"EX\",(\".*?$(.*?)$\")', '"$2","EX",$1') into [contacts2.csv] from [contacts.csv]

The same method can be used to replace national telephone numbers into an international format:
select replaceregex($line, '\+?(32\d{8,9})', '+$1') into [meucci3.csv] from [meucci2.csv]

You need to adopt the regular expression to a format appropriate for your contacts.

Import the resulting file in GMail, and that's it.

Cleanup old files from your harddisk using PowerShell

This script removes all files from the current folder (in this case 'c:\temp') that are not accessed during the last 3 months. A confirmation is asked because -Confirm is included.

Get-ChildItem -Recurse | where { $_.LastAccessTime -lt (Get-Date).AddMonths(-3) } | Remove-Item -Recurse -Force -Confirm

quarta-feira, 8 de agosto de 2012

Teaching the World to Search

Posted by Maggie Johnson, Director of Education and University Relations

For two weeks in July, we ran Power Searching with Google, a MOOC (Massive Open Online Course) similar to those pioneered by Stanford and MIT. We blended this format with our social and communication tools to create a community learning experience around search. The course covered tips and tricks for Google Search, like using the search box as a calculator, or color filtering to find images.

The course had interactive activities to practice new skills and reinforce learning, and many opportunities to connect with other students using tools such as Google Groups, Moderator and Google+. Two of our search experts, Dan Russell and Matt Cutts, moderated Hangouts on Air, answering dozens of questions from students in the course. There were pre-, mid- and post-class assessments that students were required to pass to receive a certificate of completion. The course content is still available.

We had 155,000 students register for the course, from 196 countries. Of these, 29% of those who completed the first assessment passed the course and received a certificate. What was especially surprising was 96% of the students who completed the course liked the format and would be interested in taking other MOOCs.

This learning format is not new, as anyone who has worked in eLearning over the past 20 years knows. But what makes it different now is the large, global cohort of students who go through the class together. The discussion forums and Google+ streams were very active with students asking and answering questions, and providing additional ideas and content beyond what’s offered by the instructor. This learning interaction enabled by a massive “classroom”, is truly a new experience for students and teachers in an online environment.

Going forward, we will be offering Power Searching with Google again, so if you missed the first opportunity to get your certificate, you’ll have a second chance. Watch here for news about Power Searching as well as some educational ideas that we are exploring.

segunda-feira, 6 de agosto de 2012

Speech Recognition and Deep Learning

Posted by Vincent Vanhoucke, Research Scientist, Speech Team

The New York Times recently published an article about Google’s large scale deep learning project, which learns to discover patterns in large datasets, including... cats on YouTube!

What’s the point of building a gigantic cat detector you might ask? When you combine large amounts of data, large-scale distributed computing and powerful machine learning algorithms, you can apply the technology to address a large variety of practical problems.

With the launch of the latest Android platform release, Jelly Bean, we’ve taken a significant step towards making that technology useful: when you speak to your Android phone, chances are, you are talking to a neural network trained to recognize your speech.

Using neural networks for speech recognition is nothing new: the first proofs of concept were developed in the late 1980s(1), and after what can only be described as a 20-year dry-spell, evidence that the technology could scale to modern computing resources has recently begun to emerge(2). What changed? Access to larger and larger databases of speech, advances in computing power, including GPUs and fast distributed computing clusters such as the Google Compute Engine, unveiled at Google I/O this year, and a better understanding of how to scale the algorithms to make them effective learners.

The research, which reduces the error rate by over 20%, will be presented(3) at a conference this September, but true to our philosophy of integrated research, we’re delighted to bring the bleeding edge to our users first.

--

1 Phoneme recognition using time-delay neural networks, A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K.J. Lang. IEEE Transactions on Acoustics, Speech and Signal Processing, vol.37, no.3, pp.328-339, Mar 1989.

2 Acoustic Modeling using Deep Belief Networks, A. Mohamed, G. Dahl and G. Hinton. Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing.

3 Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition, N. Jaitly, P. Nguyen, A. Senior and V. Vanhoucke, Accepted for publication in the Proceedings of Interspeech 2012.

quinta-feira, 2 de agosto de 2012

FxGqlC: Added support for DateTime datatype

SELECT convert(string, convert(datetime, '2012-07-13'), 'yyyyMMdd HH:mm:ss')
-- Formats datetime using a format string, as defined by the .net Framework
-- "Standard Date and Time Format Strings" (http://msdn.microsoft.com/en-us/library/az4se3k1), and
-- "Custom Date and Time Format Strings" (http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx)
SELECT datepart(day, '2012-07-13') -- returns 13
-- valid datepart values are: (with examples for '2012-07-12 23:59:50.1234567')
-- year, yy, yyyy : 2012
-- quarter, qq, q : 3 (1 ... 4)
-- month, mm, m : 7 (1 ... 12)
-- dayofyear, dy, y : 194 (1 ... 366)
-- day, dd, d : 12 (1 ... 31)
-- weekday, dw, w : 5 (1 = Sunday ... 7 = Saturday)
-- hour, hh, h : 23 (0 ... 23)
-- minute, mi, n : 59 (0 ... 59)
-- second, ss, s : 50 (0 ... 59)
-- millisecond, ms : 123 (0 ... 999)
-- microsecond, mcs : 123456 (0 ... 999999)
-- nanosecond, ns : 123456700 (0 ... 999999900)
SELECT dateadd(day, 10, '2012-07-03')
-- returns 2012-07-13
SELECT datediff(day, '2012-07-03', '2012-07-13')
-- returns 10
SELECT datediff(day, '2012-07-12 23:59', '2012-07-13 00:01')
-- returns 1, the number of day-boundaries crossed (as in T-SQL)
SELECT datediff(day, '2012-07-13 23:59', '2012-07-13 00:01')
-- returns 0
SELECT datediff(day, '2012-07-14 23:59', '2012-07-13 00:00')
-- returns -1
SELECT getdate(), getutcdate()
-- returns current DateTime in local and UTC/GMT time

Reflections on Digital Interactions: Thoughts from the 2012 NA Faculty Summit

Posted by Alfred Spector, Vice President of Research and Special Initiatives

Last week, we held our eighth annual North America Computer Science Faculty Summit at our headquarters in Mountain View. Over 100 leading faculty joined us from 65 universities located in North America, Asia Pacific and Latin America to attend the two-day Summit, which focused on new interactions in our increasingly digital world.

In my introductory remarks, I shared some themes that are shaping our research agenda. The first relates to the amazing scale of systems we now can contemplate. How can we get to computational clouds of, perhaps, a billion cores (or processing elements)? How can such clouds be efficient and manageable, and what will they be capable of? Google is actively working on most aspects of large scale systems, and we continue to look for opportunities to collaborate with our academic colleagues. I note that we announced a cloud-based program to support Education based on Google App Engine technology.

Another theme in my introduction was semantic understanding. With the introduction of our Knowledge Graph and other work, we are making great progress toward data-driven analysis of the meaning of information. Users, who provide a continual stream of subtle feedback, drive continuous improvement in the quality of our systems, whether about a celebrity, the meaning of a word in context, or a historical event. In addition, we have found that the combination of information from multiple sources helps us understand meaning more efficiently. When multiple signals are aggregated, particularly with different types of analysis, we have fewer errors and improved semantic understanding. Applying the “combination hypothesis,” makes systems more intelligent.

Finally, I talked about User Experience. Our field is developing ever more creative user interfaces (which both present information to users, and accept information from them), partially due to the revolution in mobile computing but also due in-part to the availability of large-scale processing in the cloud and deeper semantic understanding. There is no doubt that our interactions with computers will be vastly different 10 years from now, and they will be significantly more fluid, or natural.

This page lists the Googler and Faculty presentations at the summit.

One of the highest intensity sessions we had was the panel on online learning with Daphne Koller from Stanford/Coursera, and Peter Norvig and Bradley Horowitz from Google. While there is a long way to go, I am so pleased that academicians are now thinking seriously about how information technology can be used to make education more effective and efficient. The infrastructure and user-device building blocks are there, and I think the community can now quickly get creative and provide the experiences we want for our students. Certainly, our own recent experience with our online Power Searching Course shows that the baseline approach works, but it also illustrates how much more can be done.

I asked Elliot Solloway (University of Michigan) and Cathleen Norris (University of North Texas), two faculty attendees, to provide their perspective on the panel and they have posted their reflections on their blog.

The digital era is changing the human experience. The summit talks and sessions exemplified the new ways in which we interact with devices, each other, and the world around us, and revealed the vast potential for further innovation in this space. Events such as these keep ideas flowing and it’s immensely fun to be part of very broadly-based, computer science community.