About SatSure:
SatSure is a deep tech, decision intelligence company that leverages Earth-Observation (EO) data to solve crucial problems at the nexus of agriculture, finance, infrastructure, utilities, aviation, energy, and climate, to name a few. Our goal is to create an impact for the other millions, focusing on the developing world. We want to make insights from earth observation data accessible to all.
With a founding team having roots in the Indian Institute of Space Science and Technology (IIST), Indian Space Research Organisation (ISRO), Indian Institute of Remote Sensing (IIRS), and the leadership team having diverse backgrounds (IBM, Samsung, Intel, USC, IITKGP, IITG, IITM), we value technical innovation and scale. If you are interested in working in an environment that focuses on the impact on society, driven by cutting-edge technology, and where you will have the freedom to work on innovative ideas and be creative with no hierarchies, SatSure is the place for you.
Data Science Intern, EO Data Team:
We are looking for an Data Science Intern in our EO Applied Data Science (EO Data) Team. Our mission is realized through foundational research and development in applied machine learning. With a plethora of Geospatial Data Science use cases that we have solved so far, such as Land Use and Land Cover (LULC), Crop Classification, Sowing and Harvest Progression, Change Detection, Route Optimization, Satellite Image Time-Series (SITS) classification, Image2Image (I2I) Translation, Cross-Modal Fusion etc, we are now focusing on advancing the next-generation Machine Learning (ML) applications, and surpass the State-Of-The-Art (SOTA), especially in more ambiguous, complex geographies.
We look forward to applying our research to critical products while touching the lives of millions of users, via revolutionary, real, and near-real-time large-scale software systems utilizing Terabytes of data. At the core of such systems, we envision foundational geospatial data science models that are season, modality, and ground agnostic.
We have been at the forefront of adaptable and efficient models, as evidenced by our findings through publications at top ML/GRS conferences.
Key Responsibilities:
- Work in collaboration with applied data scientists, MLOps, geospatial experts, and platform engineers to envision solutions to real-world, ambiguous business use cases with low latency/ high throughput.
- Focus on identifying and solving assigned problems with simple and elegant solutions, while working backwards from desired requirements.
- Quickly propose and validate hypotheses to direct the science roadmap. Own time-bound, End-to-End (E2E) solutions for ML applications, ranging from resource, requirements gathering, data collection, cleaning and annotation, model development, and validation.
- Brainstorm, deep dive, implement, and debug into fundamentals of the systems (e.g., architectures, losses, efficiency, serving, etc), while writing clean code.
- Define proper output Data Science metrics.
- Clearly communicate findings verbally and in writing to stakeholders of varied backgrounds. Have attention to detail.
- Engage and initiate collaborative efforts to meet ambitious (applied research and product/client delivery) goals.
- Innovate and advance State-Of-The-Art (SOTA) in-house solutions, and communicate findings as IPs (patents, papers), as deemed applicable by business.
About You:
To be eligible for this role, we are looking for candidates with the following qualifications:
- Pursuing M.Tech, MS (Research), PhD in a technical field (e.g., CS, EE, EC, Remote Sensing, etc), preferably from leading academic/ industrial labs/institutes, or corporates. Undergraduates/Dual-Degree with research experience as mentioned below may also be considered.
- A proven track record of relevant experience in computer vision, NLP, learning theory, optimization, ML systems, foundational models, etc.
- Technically familiar with some, or most of (as evidenced by problem-solving skills in novel scenarios): Convolutional Neural Networks (CNNs), LSTMs/RNNs/GRUs, Transformers, UNet, YOLO, RCNN, Encoder-Decoder Architectures, Generative Models (GAN, VAE, Diffusion), Contrastive Learning, Self-Supervised Learning, Semi-Supervised Learning, Representation Learning, Image Super Resolution, Traditional Machine Learning (Classification, Regression, Clustering), Active Learning, Learning with Noisy Labels, Multimodal Learning, Synthetic Aperture Radar (SAR)/VV-VH bands, Normalized Difference Vegetation Index (NDVI), False Colour Composite (FCC), Dimensionality Reduction (PCA, UMAP, Isomap), Time-Series Modeling/ Forecasting, Model compression (Distillation, Pruning, Quantization), Automatic Mixed Precision training, Fourier Neural Operator (FNO), Climate+AI, Domain Adaptation, Domain Generalization, Anomaly Detection etc.
- Experience working in industry (0-1 years of experience), if applicable will also be considered.
- Candidates with prior publications in (main tracks/ workshops of) ICLR, CVPR, ICCV, ECCV, NeurIPS, ICML, AAAI, IJCAI, ACL, EMNLP, TACL, NAACL, TMLR, IGARSS, InGARSS, IEEE Transactions, etc, would have an edge too (with preference to first-authored ones).
- Proficiency in at least one general programming language (preferably, Python), along with strong hands-on experience with ML frameworks (e.g. PyTorch) in terms of training large, optimized, scalable, ML models.
- Experience with SQL, large-scale distributed systems (e.g., Spark), and MLOps will be handy.
- Strong verbal and written communication skills.