MehalPatelCS200W: November 2013

Saturday, November 30, 2013

Communications and Security: Bad Passwords!

Have you ever wondered why do we use "passwords" to gain access to various computing resources such as e-mails, social networking websites and personal computers. Before I answer question about "why password is a convenient choice but are bad choice of authentication", let's look at couple of other examples available for authentication purpose. Bio-metric system (heart-beat and retina scan) and smart cards are few other types of authentication systems. Why don't we use these systems for authentication? Can you imagine logging to your Facebook account through Bio-metric system? Why does it sound implausible? Because, these authentication systems cost money. Unlike passwords, they aren't free!

While it's true that passwords are not so very secure way of authentication, due to convenience over other types of authentication systems they are quite popular. However, care should be taken when forming a password. On the other hand, it's also true that when an extra level of security is required, other type of systems are also used along with password authentication. (PIN plus smart card access) Below mentioned points will help you understand on how passwords are bad for authentication purpose.

Computers are good at remembering a random alphanumeric number while humans are not. This forces humans to choose not-so-random password which makes passwords vulnerable to attacks. It's easy to crack not-so-random password (a variant of brute-force) compared to a randomly generated computer password of the same length.

Often, websites help you reset/recover forgotten password based on the questionnaire. It's fairly obvious that we tend to choose questions whose answers are directly related to our life, study or interest. If attacker can get hold of your personal information, it might help them recover/reset password.

While logging into some website, it's not necessary that your login credentials are sent encrypted over internet to website servers. If an active attacker sniffs packages exchanged between user and server, attacker might get hold of login credentials and thus gaining unauthorized access to user's account.

Social engineering and key-stroke logging are some of the other examples which makes password based authentication vulnerable. While passwords are really bad for authentication, it's most convenient to use as it's free (biggest advantage) unlike other authentication systems we saw above.

Remember below points to ensure a good level of security when it comes to passwords:

1) Do not use password based on English dictionary words. Have a mixture of words, numbers and special characters.
2) Do not have a password based on your personal information such as name, age or birth date. This information can easily be obtained by others, if required.
3) Do not use the same password for all or most of the websites and/or personal computer logins. If your "common password" is compromised, all your accounts gets compromised too!
4) Choose questions wisely during sign-up on a website for password recovery. Have answers for those questions which are not so obvious to others.
5) When it comes to online bank accounts, remember to have a very complex password!
6) Use INCOGNITO window (Chrome only) whenever you access accounts on public computers.
7) Do not get carried away by "Free Wi-Fi" spots unless you trust wi-fi provider!!

Sunday, November 24, 2013

Artificial Intelligence: Probabilistic Reasoning

Human intelligence is based on constant learning and past experience. We use our prior knowledge and use it in one or the other way to do things. In more or less same manner, we can have machines with intelligence. Machines (no, not those military machines) i.e. computers can be fed intelligence to assist us with various tasks. The branch of Computer Science that deals with the study of machine intelligence is termed as Artificial Intelligence and its roots are dated as early as in the year 1956.

Artificial Intelligence is useful in numerous areas. Over the period of 50 plus years many techniques and tools have been developed to solve a variety of difficult problems. Neural networks, Probabilistic methods for uncertain reasoning, Statistical learning methods and Intelligent search and optimization are some of the tools/techniques used to solve problems in domains where Artificial Intelligence is useful. Let's understand how Probabilistic methods for uncertain reasoning works.

Consider for example, an application where we want to have a system that can help us diagnose a disease for a given patient (This is called as Probabilistic Inference). The end result would certainly be a probability value. The system will more or less acts as a Doctor. Now, what set of inputs does this system take? We might have inputs like different symptoms patient has, results of medical test(s) and possibly patient's habits (e.g drinking, smoking etc). The set of inputs really depend upon the type of diagnostic system we want to develop. The set of inputs are expressed in terms of probability values. The system being developed goes through 3 phases: Representation, Inference and Learning.

Each of these individual inputs are related to other inputs and they kind of influence one another. The interaction of all these inputs is captured via a graphical model, which is constructed using either a Bayesian network or a Markov network - Representation. Each node in the graphical model corresponds to one of the inputs. So for example, when we have that patient is a chain smoker, the probability that patient exposes a chance of a certain type of cancer goes up - Inference. So, using the information available to us (information fed to the system), we eventually come to a conclusion based on the interaction we have in our graphical model.

But, before we can do inference from the system we have to train the system with some prior data - Learning. This type of data is termed as the training set. This training data helps us understand correlations between inputs . This kind of system helps us deal with uncertainty that lies within these type of applications and gives us results in terms of probabilities.

Few other examples where these technique is useful are: Robot navigation, Text analysis, Speech recognition (we answer probabilistic-ally about what word it might be) and Gene regulatory networks. I am working in the same area (called Probabilistic Graphical Model) for my master's project (CS 298).

Saturday, November 16, 2013

Computer Science: Historical Perspective

Today's modern age computers are a result of years of thorough research and innovation. This is not the end. Computers are going to get more powerful and innovative in years to come. It's interesting to know how computers were evolved the way they are today.

Computers are meant to do computing, taking place of "human computers". The time when computers didn't exist, humans did computations (e.g mathematical) by hand. The first person to envision machines doing computation was Charles Babbage in 1830s. He is often called as "Father of the computers". His simple work for arithmetic machine computing proved to be so important that many more complex machines were designed later on based on his work. This eventually gave birth to computers after 1950s. Charles Babbage's machine was referred to as "Analytical Engine", a very first representation of the modern age computer.

Years later, several advancements were made in various computer parts(around 1950s). Grace Murray Hopper developed a notion of compiler, John Backus developed the first programming language - FORTRAN and Jack kilby & Robert Noyce invented integrated circuit. Computer networks also saw advancement such as ARPAnet, which proved to be a precursor to today's Internet. Other areas such as Operating systems, Theory of databases and Computer architecture also saw an advancement over the span of subsequent twenty years.

These advancements eventually ended up with an introduction of personal computers which hit the market for the very fist time during 1974 to 1977. Scelbi & Mark-8 Altair, IBM 5100 and RadioShack's TRS-80 were some of the computers launched during this period. Steve Jobs and Steve Wozniak introduced Apple computer in 1976.

The term "PC - Personal Computer" originated when IBM released their first personal computer named "Acron" in 1981 (shown in the picture above) running Microsoft MS-DOS operating system. Following the release of Apple's first computer with GUI, Microsoft responded with the release of Windows operating system. Since then the war has continued and many gradual advancements were made to computers. As a result, today computers are an integral part of every possible engineering, medical and science field. It has brought revolution in each of this field and have made our life much easier and safer.

Saturday, November 9, 2013

File Sharing: The Cloud Way

Internet is constantly changing the way people communicate and collaborate for various purposes. There was a time, I remember when files were shared using E-mails, USB drive, CD drives and even Floppy disks. There were many limitations of using such medias for communication. It was difficult for people in different part of globe to share files using physical media and e-mails only supported file size till 25 MB. Obviously, there were other options too but were more or less inconvenient, peer-to-peer file sharing for example (BitTorrent Protocol).

Today, the way people communicate and collaborate over the internet has completely changed. People cannot only share files using the internet, but can also collaborate to create and edit various types of documents such as word, power point presentations and spreadsheets. There are a numerous number of such file hosting services that runs in the cloud and provides an easy way of sharing files with other users.

Such file sharing systems are based on cloud storage. Cloud storage is a form of distributed computing where a large number of storage and computing devices are inter-connected with high-speed network to provide high speed access to data and data is replicated across the globe. Cloud storage basically offers storing data online instead of users storing it locally on their machine. There are many great advantages such as below.

1. Files are always accessible from a browser from any machine around the globe with an internet connection.

2. Users can browse and upload files from their mobile devices such as phone and tablets.

3. User files are always backed up at different locations (data centers) around the globe. It's way safe to keep important files in cloud and never lose them.

4. Files can be shared with others instantly with just few clicks.

5. Users can collaborate over the internet to create and edit various types of documents.

6. There is no practical limit on a single file size when stored in cloud.

Some popular file sharing services are Dropbox, Google Drive, Amazon Cloud Drive, Microsoft Sky Drive and Apple iCloud.

Friday, November 1, 2013

Data Structures: Hashes

Computers store data in a data structure. Computer scientists write algorithms to access data from a data structure. It's good practice to use relevant data structure based on your need. It's good to have a wide perspective on variety of data structures in order to choose an appropriate data structure for your need. Inappropriate selection of data structure tends to increase data access time, no matter how powerful computer you might have!

Array, Lists, Tree and Hash are some of the basic data structure commonly used. Each of these are suitable for one or the other purpose. Stack, queue and graph can be termed as Abstract data types since underlying implementation can use from any suitable basic data structure. For example, Stack can be based upon an Array or a Linked List.

Hash becomes handy when it is required to have frequent and quick data access and when data modification/deletion is not so frequent. The idea behind hash is to have [ key : value ] pairs stored in a data structure. Multiple values can be stored for each key in a hash. Let's look at how data is stored and accessed from a hash.

Hash takes two values: 1) input data 2) input value. First, hash function converts "Input data" to a key. The generated key than indicates the location to store corresponding "input value". These locations are often called as "buckets". Multiple "input data" can fall into same buckets and hence one key can have multiple associated values.

For accessing data from a hash, we again provide "input data" to get corresponding "input value" stored in a hash. Given an "input data", hash applies the hash function which returns the key indicating the location where corresponding value was stored. Hash than accesses that bucket location and returns the corresponding stored value. In case when multiple values are stored for a given key, a comparison function has to be used in order to get right "input value" for given "input data".

Hash operations are quite efficient and provides O(1) access time for read/write/delete for best case and O(n) access time for worst case where "n" refers to number of items in a hash.

An example where hash can be used is as below:
Consider an application which wants to retrieve a person's phone number in no time given a name. A hash can be generated for [ name : phone-number ] pairs where name is a key and phone-number is a corresponding value. Now whenever we want to find a person's phone number, we can feed in that person's name to application returning his/her phone-number immediately, if found in O(1) time. This can be useful when say we data for millions of people in our database and we want a quick access to give person's phone-number.

Click here to watch an excellent video on Hash data structure.

MehalPatelCS200W