Sunday, February 12, 2012

Counting


Letters and numbers are the two eyes of man. - Thirukkural (392)

Numbers, the oldest and greatest invention by the man kind, are the base for measurements, without which no useful inventions could ever be made. The objective of this post is to think through the representation of measurements, particularly the counting, in ancient tamil culture.

Ancient tamil culture:

Measurements are broadly classified into 7 parts.
  1. Counting (எண்ணல்)
  2. Weight (நிறுத்தல்)
  3. Liter for liquid (முகத்தல்)
  4. Volume for solid (பெய்தல்)
  5. Length (நீட்டல்)
  6. Time (தெறிப்பு)
  7. Comparison (சார்த்தல்)
Let's limit the scope only to counting for this post.

Counting are generally classified into two parts
  • Chitrilakkam - Fractions.
  • Perilakkam - Whole Numbers.
The following shows the names given for the fractions:
























The following shows the names given for the whole numbers:



























Other measurements include
  • 100 crores is Kumbam ( கும்பம் )
  • 1 lakh crores is Sangam ( சங்கம் )
  • 1 crore crores is Thamarai ( தாமரை )
  • 100 crore crores is Vaaranam ( வாரணம் )
Few other related words are:
  • Infinite is called Mudivili ( முடிவிலி )
  • One without any origin is called Anathi ( அனாதி )
The word Anaathi (அனாதி), which refers God, is the origin of the word Anathai (அனாதை), meaning orphan, implying that God is an orphan.

Also, the names used to represent the numbers, also correspond to other living and non-living entities of the corresponding size. For eg, Nummanal (நுண்மணல்) nice sand, is smaller than Mundiri (முந்திரி ), a cashew. Similarly, Vaaranam (வாரணம்), an elephant, is bigger than Thamarai (தாமரை ), a lotus. This also reflects how people lived closely with the nature.

Additionally, having names for such large numbers evidences their profoundness in mathematics and calculations.

Though in Indian Numeric system, we do have names, such as arab, kharab and padma, in sanskrit, it is very pitiful that we hardly use such names in modern life.

For eg, in 2G scam, the newspaper headline said "Scam of 1.76 lakh crore rupees". No tamil newspaper published "1.76 சங்கம் ரூபாய் ஊழல் " (1.76 sangam rupees scam). Barely any hindi newspaper printed "1.76 शङ्कु रुपया घोटाला" (1.76 Sangu rupees scam).

Our ancestors, who lived a glorified and advanced life, handed over to us a vast knowledge, that are discarded by us in our busy life, which is very unfortunate. Learning from their lives will make our lives better.

Sunday, February 5, 2012

The human synergy

After nearly two and half years of inactiveness, I'm reloaded back to throw more ramblings in the open space of internet.

The plan is to have a weekly journal on an idea or a product, that I understand or learnt that week.

Let's get started and this week's cynosure is "reCAPTCHA"

We, often, see sites that ask us to enter the content of a distorted or skewed image of letters or numbers, that is known as CAPTCHA.


The widely known fact is that it enables the site to distinguish a human from any automated bots or scripts.

It is so reliable, that vast number of sites are using it, and about 200 million captchas are answered by humans in a day. Roughly it takes about 10 seconds per person to answer a captcha. In summation, each day more than 150,000 hours of human effort is consumed by these Captchas, which does nothing more than confirming that the detail is entered by a human.

Could this human effort be used for a higher purpose? Yes. The answer is reCAPTCHA.


reCAPTCHA uses this humongous human effort to digitize books.

At present, OCR technology, that is used to extract content from scanned material including books, is not perfect and the success ratio is only about 60 percent. reCAPTCHA is used to fill the gap by making the humans to identify and correct the word.



Here is how it works.
  1. reCAPTCHA shows two words to the user. One that is known to the computer and the other one that the computer is not sure about.
  2. The computer authenticates the user based on input for the word that it knows correct.
  3. The computer records the user input for the word that it is not sure about
Same reCAPTCHA is shown to multiple users and their input for the word, that computer is not sure about, is aggregated and the most common occurrence is the suitable match.

This method is found to be producing very accurate results, where the success rate is about 99.5% and has been used to digitize the old editions of New York Times and books from Google books.

In India, there is an active research on OCR for digitizing old transcripts in Indian Languages. Having reCAPTCHA in Indian Languages will help significantly for these projects. Thanks to Google transliteration, typing in Indian Languages is no more an irksome job. Also, choice could be given to the user to choose the conversant language.

So, every time a reCAPTCHA is solved, we can be glad that we augment the human knowledge by synergy.

Reference:
Interesting TED video about reCAPTCHA:

Tuesday, October 6, 2009

Coupling

A good design thrives for low coupling.
  • The lowest form of coupling is set of disconnected objects, which couldn't form a system. (Since, objects don't interact with each other.)
  • The highest form of coupling is Inheritance, which is one of the pillars of OOPS.

The answer for this irony can be "everything is a trade-off in software development". Principles are only guidelines and not rules.

An Anology: "Drive Slowly" is a common driving safety guideline. At the slowest speed (0 km/hr), the vehicle doesn't move. Also, people generally wants to buy a vehicle which gives maximum power and speed. Its a guideline to be applied pragmatically.


P.S: My another post which shows the legal violation of Command Query Separation Principle.

Distributed Versioning

The version control systems are generally rated as follows based on their usability.
  1. Visual Source Safe (Low)
  2. CVS
  3. SVN
  4. Git (High)

Cons:
  • Visual Source Safe doesn't allow "Concurrent Access". Only one developer can edit a file at a time. Hence, not suitable for a large team with frequent commits.
  • Some times gives weird error messages, making the system unstable.
Pro:
  • Allows "Concurrent Access". Multiple developer can edit the file and system has the capability to merge the changes. If there is a conflict, developer is informed to resolve it.
Cons:
  • Commits are not atomic. The changes are maintained at the file level and not at the repository level. For eg: if we commit five files at once and want to revoke it, it has to be done individually for each files.
Pro:
  • Commits are atomic. Committed files can be revoked together. Repository maintains the check-ins.
Cons:
  • Cannot view the history of the file, when not connected to the server.
Git (Distributed Revision Control System (DRCS))
Pro:
  • Along with the master repository, every developer will have a copy of the repository. Hence, even in disconnected mode, the developer can see the history of the files. The developer can apply the patch to the master repository.
  • The patch can also be sent to the peer, which makes it excellent to work in a distributed team.
In a centralized team, SVN is a good choice as version control and Git is ideal for the distributed teams.

Review - I

During code reviews, one of the things that is considered is the "Command Query Separation Principle" Violation.

Operations should either do something and return nothing or return something and do nothing, NOT both.

The common violations are
  1. Getter Methods having void as their return types and
  2. Setter methods having some return values.

Getter Methods having void as their return types.
For eg:
void GetSalary(Employee emp)
{
Money salary;
//Fetch Salary from some source.
emp.Salary = salary;
}
Generally, these violations can be fixed by renaming the method name to Update.

void UpdateSalary(Employee emp);

Setter methods having somereturn values.
For eg:
bool SetDepartment(Employee emp);
Here, it gets tricky. The programmer sets the department inside the method and wants the confirmation by the return value.

There are other ways to handle this rather than sending the return value.
  1. Having a method HasDepartment() in the Employee, to check whether it has Department.
  2. Raise the exception if no matching department can be found for the employee.
The other ways by which this principle is violated are clearly trying to do too many things in a method and has to be refactored.

But, there is a case where this violation is acceptable, which is called Fluent Interfaces. In this case, the signature will become something like the following.

Employee SetDepartment(Employee emp);
In this case also, generally the return values are the entity types and not the primitive or Value Object types.

Thursday, July 5, 2007

Download File from Internet using .NET

The WebClient class provides common methods for sending data to and receiving data from any local, intranet, or internet resource identified by a URI.
Namespace: System.Net
Assembly: System (in system.dll)

The DownloadFile method of this class downloads data from a resource to a local file.

Snippet:

WebClient client = new WebClient()
client.DownloadFile(source, destination);

eg:
source - http://www.whitelines.nl/html/google-page-rank.xls
destination - C:\GooglePageRank.xls

Note: By default, the .NET Framework supports URIs that begin with http:, https:, ftp:, and file: scheme identifiers.