Startups Using Pandas in Boston
Via their job posts and information submitted by startups themselves, these are the Boston Pandas startups we've found.
Interested in other technologies? Browse or search all of the built-in-boston tech stacks we've curated.
AI-powered tech for evaluating photos of a vehicle, detecting damage, and automating claim estimation.
Using computer vision & machine learning to turn human or animal body language into structured data, helping to develop better drugs for diseases like ALS, Parkinson’s, and Alzheimer’s.
“Predictive analytics platform to make data science automation a part of any organization.”
KAYAK for car insurance. Quick, personalized car insurance quotes from multiple providers.
Tech Stack Highlights
Python – We’re using python for our core app, with Django/DRF powering our REST API, NLTK for NLP, and pandas running high-performance real time data analysis to calculate things like RateRank and savings estimates. We use Vagrant and Ansible for IT automation, and Jenkins and Selenium for QA automation and deployment to our AWS environment.
MariaDB – Our database runs Maria on RDS, for optimal MySQL-syntax performance. We crunch a lot of data in each query, so performance is key. Some of our queries approach 100 lines long, with multiple nested queries, dozens of joins, and layered aggregation, and we run some queries thousands of times per day.
Backbone – Backbone provided us “just enough” structure for our highly custom front-end MVC, while allowing us to build our own proprietary routing & workflow engine around it. We’re using epoxy for 2-way data-binding, and jQuery + Bootstrap plugins, in addition to dozens of proprietary UI components.
Bootstrap – Our mobile-first-responsive CSS uses Bootstrap as a baseline, but builds upon it to form a highly-customized, well-organized extensible style-guide with our own unique components and layouts. We’re using SASS class-extension, selector-nesting, and custom mixins under the hood to generate our CSS.
Real-time admissions and discharge notifications link providers anywhere patients receive care.
Tech Stack Highlights
Spring Boot – We field a number of microservices on top of Spring Boot. Its convention-over-configuration design allows us to focus on business logic rather than plumbing. We’re particularly looking forward to the Spring team’s upcoming first-class support for Kotlin, which we’ve been gradually introducing as a safe, expressive alternative to Java 8.
React + Redux – We’ve built a highly interactive and engaging front-end using React and Redux. The resulting code is modular, easy to reason about, flexible, and composable.
Kafka – We use Kafka as our primary message bus. Unlike most “big data” technologies, Kafka has allowed us to scale without imposing a notable increase in complexity. In fact, becuase its append-only architecture allows us to view topic contents long after the message has been “consumed”, Kafka allows us to significantly improve monitoring and visibility over more traditional message buses (JMS, AMQP). We’re looking forward to experimenting with Kafka Streams as a lightweight alternative to standalone stream processing frameworks such as Spark.
Zeppelin – We use Apache Zeppelin to query, aggregate, and visualize data across a number of heterogeneous data sources, including MySQL, ElasticSearch, and S3. We write ‘notebooks’ in Scala and SQL to drive Spark in creating these visualizations. These notebooks can be ad hoc or shared, versioned, and parameterized.
NiFi – We use NiFi as an orchestration layer to manage real-time data flows in a simple scaleable way. The framework provides us with the ability to easily monitor the progress of messages as they move through the processing pipeline and to replay messages should it be necessary.
Platform for privately building & testing financial trading algorithms.
Interpreting and visualizing real-time patient data in the ICU for better critical care decision making.
Consumer travel discovery / search engine.
Employee performance prediction tools for hirers / recruiters.
Tech Stack Highlights
MySQL – MySQL is used to provide the main data storage for all business critical information such as user data, jobs, candidates, assessments meta-data etc. We use NDB cluster as well as full redundancy real-time back-up server. Additionally the data is archived hourly, daily and weekly. When it comes to data security – nothing is ever too much.
MongoDB – Thousands of data points a minute are streaming to our servers in the form of user responses to pre-employment assessment answers. This data constitutes the main material for later analytics. Mongo’s Sharding technique allows us to employ multiple low cost instances to handle all this data in parallel fashion. Like MySQL data, No-SQL data is fully redundant and backed up on regular basis.
Python/R – Both Python and R are used to automate the data analytics, required for creating job-success predictions. While Python provides a much more versatile and reliable development environment (especially with modules like NumPy, Pandas, etc), R still has advantages in certain areas. Python’s rpy2 module make the two work together pretty decently.
Apache/PHP – Since our web application is a single-page app, the web service is mainly used as a REST-style backend that interacts with the browser by sending back-and-forth JSON packages. Memcached allows to maintain single state between all web instances. Other great tools like WKPDF (that is used for server-side web rendering) for creating downloadable materials, etc.
JavaScript/Web MVP – On the client we took a rather unorthodox approach of creating our own MVP framework that connects seamlessly with the backend, and makes the entire development cycle much faster. The framework that we created (ElementsJS) makes use of jQuery as well as multiple open-source jQuery plug-ins, while binding them together in a simple to use JavaScript API.