Back to Blogs

Why I Chose Python for Informatics and Data Science?

My Coding Journey

Ever since I began programming in PHP and Java during my master's degree in 2014, I've been on a journey to continuously expand my knowledge and skills in computer science. Back then, I was eager and inspired to create a social media platform similar to Facebook, not realizing just how complex such a project would be.

This social media platform became my pet project, serving as a hands-on learning experience in developing real-world web applications. My ultimate goal was to eventually apply this knowledge to the healthcare sector. If you're curious about the progress of this project, feel free to visit diversees.com.

During the early stages of my coding journey, I gained hands-on experience setting up my local development environment with XAMPP, designing user interfaces using Bootstrap, and mapping out user workflows with tools like Balsamiq and Adobe XD. I also worked on manipulating HTML elements with JavaScript, managing databases with SQL and MariaDB, and handling backend processing with PHP. Developing this personal project was incredibly enjoyable and educational, as I had the freedom to choose the technologies and design patterns without any constraints.

Selecting the Right Programming Language for My Career

By 2018, I recognized the need to select a programming language that aligned with my career goals in health informatics and data science. At that point, I noticed that JavaScript, SQL, and PHP were not prominently featured in many informatics and data science projects, which seemed to limit my career prospects.

To address this, I decided to explore other programming languages. I conducted online research and consulted with colleagues for their recommendations. Given my interest in programming web applications, some of them suggested exploring C# (.NET), Ruby (Ruby on Rails), and even Rust.

After weeks of exploration, I discovered that R and Python were the dominant languages in informatics and data science. However, it’s worth noting that many programming languages can be utilized in these fields. If you’re currently deciding which language to invest in for your career, I’d like to share the reasons why I chose Python as my primary programming language.

Reasons for Choosing Python for Informatics and Data Science:

1. Python is a General-Purpose, High-Level Programming Language

Informatics and data science often involve data collection, cleaning, statistical computing, and visualization. While SAS and R frequently come up in these contexts, they were primarily developed for statistical analysis and have some limitations outside of that scope.

My goal was to find a versatile tool that could be applied across various projects—such as web development, data scraping, API interactions, data cleaning, data pipelines, and visualization. Since SAS and R are not well-suited for web development, this was a significant drawback for me. Although R does offer packages like RShiny, it doesn't match the capabilities of frameworks like Django and Flask.

Ultimately, Python’s status as a general-purpose programming language means that it can easily adapt to emerging areas in computer science. The Python community is proactive in developing packages and tools to support new needs, making it a versatile choice for a wide range of applications.

2. Python’s Syntax is User-Friendly

Back in 2018, although R was gaining popularity, I struggled to grasp its concepts. The primary challenge was that R’s syntax and approach were quite different from my initial language, PHP. In contrast, Python is incredibly straightforward and easy to understand. Its syntax is clean and intuitive, often resembling plain English with a bit of linear equation to work on your data.

One key piece of advice for learning Python is to familiarize yourself with object-oriented programming (OOP). Understanding this concept reveals that everything in Python is an object, which simplifies many aspects of the language.

The bottom line is that if you're aiming to quickly grasp informatics and data science, Python is an excellent choice. Its ease of learning and use can make the journey more efficient compared to other programming languages.

3. Python is a Powerhouse for Informatics and Data Science Packages

In programming, a package is a collection of code designed to perform specific functions for particular use cases. For instance, if you want to develop a full-featured web application, Django is a great choice. For building APIs, FastAPI is an excellent option. When it comes to data science, Python boasts a rich ecosystem of packages, including NumPy, Pandas, Matplotlib, Seaborn, and Plotly.

It also excels in machine learning and deep learning with packages like NLTK and TensorFlow. For those working with HL7 Fast Healthcare Interoperability Resources (FHIR), Python offers packages for FHIR resources and SMART in FHIR clients. For web scraping, Beautiful Soup and Scrapy are popular choices.

If you have limited time to dedicate to learning each day, it makes sense to invest your energy in a language that already has a rich ecosystem of packages relevant to your field. This is one reason why the Google Advanced Data Analytics Professional Certificate has chosen Python as the core language for its courses.

Conclusion

For aspiring data professionals, Python’s versatility and ease of use make it an ideal programming language for informatics and data science. Its general-purpose nature allows it to support a wide range of applications, from web development to data visualization. Its simple syntax and extensive package ecosystem streamline learning and application, enabling you to tackle diverse tasks with efficiency. Lastly, its comprehensive libraries and active community provide robust support, reducing the need to search for additional tools.