Skip to content

Latest commit

 

History

History
246 lines (132 loc) · 9.1 KB

data_science_interviews.md

File metadata and controls

246 lines (132 loc) · 9.1 KB

What is PEP 8 and why is it important?

What is Scope in Python?

What are lists and tuples? What is the key difference between the two?

What are modules and packages in Python?

What is self in Python?

What are decorators in Python?

What is lambda in Python? Why is it used?

What are generators in Python?

Can you create a series from the dictionary object in pandas?

How will you delete indices, rows, and columns from a data frame?

Can you get items of series A that are not available in another series B?

How are NumPy arrays advantageous over python lists?

Write python function which takes a variable number of arguments.

WAP (Write a program) which takes a sequence of numbers and checks if all numbers are unique.


How do we use Eigenvalues and eigenvectors in PCA (Principal Components Analysis) ?

Difference between exogenous and auto regression in time series forecasting.

Difference between normalization and standardization, will it be used before train test split or after?

How to reduce the impact of one feature than others

Difference between XGBoost and FBProphet.

Describe the scenario where you do not make stationery data in time series forecasting problem

BERT is trained on which dataset? What model will be used if BERT does not exist? Describe self- attention mechanism.

Difference between univariate and multivariate time series forecasting problems.


Find the middle node of a given LinkedList. Used two pointer approach Slow Pointer = node.next, and Fast pointer = node.next.next; at each iteration check if any of the pointer equals to null When fast pointer is null slow pointer will be at the middle node just print node.data to get the result.

Print all the permutations of give string. There are two approaches for this either we can use permute library or we can code using loops in O(n^2).

Third last node of LinkedList, above mentioned two pointer approach will be used here as well.

Difference between call by value and call by reference. In call by value, we pass the copy of variable in the function whereas in call by reference we pass the actual variable into the function. How we do that? We pass the memory address of that variable to the function. These concepts are used with pointers in C/C++.

Difference between == and === in JavaScript. Both are used for comparison double equal to is a content comparator whereas triple equals compares both content and data types of LHS & RHS.

Difference between Breadth-first search & Depth first search.


Explanation of the past project. What were the features used and how did you determine performance?

What is the difference between linear regression and logistic regression?

What is the internal working of logistic regression (LR)?

What is the loss function of LR?

Name some hyperparameters used in LR? Why do we use regularization?

When do we use accuracy as a metric? When should we not use accuracy?

How do you deal with imbalance data?

What is SMOTE and how is it different from stratified sampling?

Watch this video to understand how SMOTE works [https://www.youtube.com/watch?v=U3X98xZ4_no]

What is better 0.51 AUC (Area Under the Curve) or 0.43 F1 score? Which one should you present to a client?

Watch this video to understand how AUC is interpreted [https://www.youtube.com/watch?v=mUMd_cKU0VM]

What does the ROC AUC value signify?

Do we only use the threshold of 0.5 or can we use other thresholds in LR? If yes, how do we find them?

Can I use a sales forecasting model built using pencils data to be used in erasers data?

How would you compare the performance of two forecasting models?

What are the different metrics used in regression analysis? Which metric should be used where?

How do you build a testing pipeline for a data science model? [https://www.kdnuggets.com/2020/08/unit-test-data-pipeline-thank-yourself-later.html]


How does Iterators and generators work in Python ?

What does Python constructors do and how are they useful ?

Explain what Map function does in Python ?

How do you flatten an image(matrix) in a deep learning architecture ?

Difference between semantic segmentation and instance segmentation ?

Which are the different types of pooling operations - what is the visual effect of applying a max pooling operation and average pooling operation on an image ?

What is the math behind convolution operation – what will be the size of a particular image (128128) after convolution operation with a 33 kernel ?

what will be the size of a particular image (128128) after convolution operation for a 33 image after applying 1*1 kernel ?

What is the Loss function and optimization function of region proposal network ?

What is Image down sampling – why do we do down sampling ?

Python coding: Solve the following using a for loop, by defining a function and put in inside a class

#Input : a =[1,2,3]

#Output : ["hello1","hello2","hello3"]

Tradeoff between YOLO and FasterRCNN in terms of speed and accuracy ?

What are feature maps and how are they obtained ?


How will you count unique values in a data frame column.

How will you convert a column data type to string ?

How will you obtain correlation coefficient between 2 columns in a data frame ?

How will you merge two data frame based on common column (when column name is same) ?

How will you merge you merge two data frames base on common column name (column name is different in left and right data frame) ?

Define the term correlation with respect to statistics ?

What are the types of correlation coefficient?

What is the difference in Pearson correlation coefficient and spearmen correlation coefficient?

How do we deal with categorical variables for statistical analysis?

How do you obtain correlation between 2 categorical variables?
How do you find Correlation between one categorical variable and other numerical variables?

What is the difference between dictionary and list?

How do you append a dictionary with another dictionary?

What is the difference between tuples and list ?

Can a tuple have different data types of element contained within it ?

How do you read data from database directly and convert it into data frame for analysis?

How do you import file.py function into another python file ?

What are generators in python ?

How will you print index and values of a list without range function ?


What is the difference between Docker and Containers?

How do you restart containers on failure?

How do you run a container in Docker?

Can you run a program that takes 4 hours to run in AWS Lambda?

What is the difference between ADD and COPY commands wrt. Dockerfile ?

Experience with different AWS services such as CloudFormation or Glue?

What is the schema in S3?

Can the lambda written in AWS interact with other infrastructure?

What is the Dockerfile setup if you want to expose the model as an API?

Difference between UDF, pandas UDF and pyspark UDFs?

Difference between synchronous and asynchronous request? How do you program one in Python?

What is the use of a DAG (Directed Acyclic Graph) in Spark?

Given the no. Of terms, print the Fibonacci sequence: Hint try both iterative and recursive methods [https://www.programiz.com/python-programming/examples/fibonacci-sequence]

Given an input string, print the length of the longest common substring without any repeating characters. [https://leetcode.com/problems/longest-substring-without-repeating-characters/]

Given an input string, write a function that returns the Run Length Encoded string for the input string. For example, if the input string is “ssslbbbbppiitttc”, then the function should return “s3l1b4p2i2t3c1”


Given a list, ls = [9,8,3,4,1,0,2,7,7,6], write a function to get nth highest element without using any inbuilt functions or sorting.

Write a python class with method to sort a list and related questions on classes, static methods, init etc.

Difference between RANK and DENSE RANK?

Difference between parquet and csv file format? How are files written in a parquet file?

What is Cursor command in SQL?

Difference between Spark vs MapReduce architecture?

Explanation of ETL pipeline

Containerization v/s virtualization

What is port redirection in docker?

How to create a table with Databricks storage?

Difference between SQL and NoSQL DB?

A scenario where data keeps on changing, with adding and updating new features , would you consider SQL or NoSQL?

What is the difference between iterators and generators

What is the difference between OLAP and OLTP?